The Consistency Problem in AI Art
AI image generation has a dirty secret: it’s excellent at producing single, standalone images, but it struggles with consistency. Ask any AI tool to generate the same character in ten different poses, and you’ll get ten slightly different characters. The face drifts. The proportions shift. The clothing changes. The distinctive details that make a character recognizable are lost between generations.
For casual use, this inconsistency is acceptable—each image stands alone, and minor variations don’t matter. For professional creative work, it’s a dealbreaker. A character designer needs the same character across a design sheet. A book illustrator needs the same protagonist across 30 chapters. A game studio needs the same hero across hundreds of in-game assets.
Leonardo Phoenix (leonardo.ai), Leonardo.ai’s proprietary generation model, addresses consistency not as an afterthought but as a core design principle. Combined with the platform’s LoRA training capabilities, it creates a system where generating consistent characters and maintaining visual coherence across large projects is both achievable and practical.
How Leonardo Phoenix Achieves Consistency
Architecture-Level Design
Leonardo Phoenix was trained with character consistency as an explicit objective. Unlike general-purpose diffusion models that optimize for diversity (producing varied outputs for the same prompt), Phoenix includes training signals that reward visual coherence when generating related images.
The practical effect: when you provide Phoenix with a character reference and ask for the same character in a different pose or setting, the model maintains more stable features than competing models. Facial structure, body proportions, and distinctive characteristics transfer more reliably between generations.
Character Reference Embeddings
Phoenix supports a character reference system that creates a compressed identity representation from uploaded reference images. This embedding captures:
- Facial geometry (bone structure, eye shape, nose profile)
- Distinctive features (scars, birthmarks, hair style, glasses)
- Body proportions (height, build, limb ratios)
- Skin tone and complexion
When generating new images, the character embedding is injected into the generation pipeline, guiding the diffusion process toward output that matches the reference identity.
LoRA Fine-Tuning for Deep Consistency
For the highest level of character consistency, Leonardo’s in-platform LoRA training goes further than character references. By training a dedicated LoRA on a specific character, users create a model modification that deeply encodes the character’s visual identity.
A character LoRA trained on 30-50 reference images produces substantially better consistency than a reference embedding alone. The LoRA captures not just what the character looks like, but how the character’s features behave under different lighting, angles, and expressions.
The training workflow:
- Prepare 30-50 images of the character from different angles, with different expressions, in different lighting
- Upload to Leonardo’s training interface
- Configure training parameters (Leonardo provides recommended defaults for character training)
- Train (typically 15-30 minutes of processing)
- Generate using the trained LoRA with adjustable weight
Practical Applications
Character Design Sheets
A character designer producing a design sheet—front view, side view, back view, three-quarter view, with expression variations—can now generate these views with sufficient consistency to serve as a professional reference document. Previously, this required either manual drawing or extensive post-generation editing to harmonize the AI-generated views.
Book and Comic Illustration
For illustrated books and comics, protagonist consistency across dozens or hundreds of pages is essential. Leonardo’s LoRA-trained characters maintain sufficient consistency for:
- Chapter header illustrations
- Scene illustrations throughout a book
- Comic panels across extended storylines
- Character cards and reference sheets for series bibles
The consistency isn’t perfect—hand refinement is still needed for the most critical panels—but it reduces the editorial correction work from hours to minutes per illustration.
Game Asset Production
Game studios use character-trained LoRAs to generate:
- Multiple expressions/emotions for dialogue systems
- Character variations (armor sets, costume changes, seasonal outfits)
- Promotional art featuring consistent character representation
- In-game collectible cards and achievement badges
- Social media content featuring game characters
Brand Character Development
Brands with mascots or representative characters use Leonardo to generate diverse content featuring their character—different situations, settings, and contexts—while maintaining the visual identity that audiences recognize.
Training Best Practices
Dataset Quality
The quality of the trained LoRA directly depends on the quality and diversity of the training dataset:
- Quantity: 30-50 images minimum; 50-100 for best results
- Diversity: Multiple angles, lighting conditions, expressions, and contexts
- Consistency: All images should clearly depict the same character
- Quality: High-resolution, well-lit images produce better training results
- Background variety: Different backgrounds help the model separate character features from context
Common Training Mistakes
- Too few images: Below 20 images, the LoRA may not capture enough variation to generalize well
- Too little diversity: All images from the same angle/lighting leads to poor performance on different angles
- Inconsistent subject: Including images of different characters confuses the training
- Low resolution: Low-quality training images produce low-quality output
- Over-training: Too many training epochs can cause the model to memorize training images rather than learning generalizable features
Recommended Settings
Leonardo provides default training configurations that work well for most character training tasks. Advanced users can adjust:
- Learning rate (lower for more subtle adaptation)
- Training epochs (more for complex characters, fewer for simple designs)
- Regularization strength (to prevent overfitting)
Comparing Consistency Approaches
Leonardo Phoenix + LoRA vs. Midjourney Character Reference
Midjourney’s character reference feature works without training—upload a reference image and generate. It’s faster but less consistent. Across a large number of generations, Leonardo’s trained LoRA maintains tighter consistency than Midjourney’s zero-shot reference approach.
Leonardo Phoenix + LoRA vs. Civitai Community LoRAs
Civitai hosts community-trained character LoRAs, but training them requires local GPU hardware and technical expertise. Leonardo’s in-platform training removes the hardware requirement and simplifies the process. The quality is comparable—the difference is accessibility.
Leonardo Phoenix + LoRA vs. IP-Adapter/InstantID
Open-source consistency tools (IP-Adapter, InstantID) offer zero-shot or few-shot consistency through reference image conditioning. They’re effective but require ComfyUI or similar setup. Leonardo’s approach is more integrated and accessible, though the underlying technology is conceptually similar.
Limitations and Honest Assessment
What Works Well
- Maintaining facial identity across different poses and expressions
- Consistent body proportions and build
- Stable clothing and accessory representation
- Reliable skin tone and coloring
What Still Challenges
- Very extreme angle changes (front view to directly behind)
- Aging or de-aging the character
- Dramatic style shifts (e.g., realistic character to anime style) while maintaining identity
- Hands and finger details (an industry-wide problem, not specific to Leonardo)
The “Last Mile” Problem
Even with strong consistency, professional output typically requires a “last mile” of human refinement—correcting minor facial drift, fixing hand positions, ensuring costume details match exactly. Leonardo dramatically reduces the work required, but it doesn’t eliminate it entirely.
The Broader Impact
Leonardo’s approach to consistency represents a meaningful shift in how AI image generation can be used for professional creative work. By making character consistency achievable through a hosted platform with guided tools, Leonardo has lowered the barrier that previously kept many creative professionals from adopting AI generation.
The standard is now: consistent characters are expected from professional AI tools, not optional. Competitors who don’t offer comparable consistency features are at a disadvantage for professional use cases.
References
- Leonardo.ai Official Website. https://leonardo.ai
- Leonardo.ai. “Phoenix Model: Training and Architecture.” Leonardo Blog, 2025.
- Hu, E. J., et al. “LoRA: Low-Rank Adaptation of Large Language Models.” ICLR, 2022.
- Ruiz, N., et al. “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.” CVPR, 2023.
- Ye, H., et al. “IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models.” arXiv, 2023.
- Wang, Q., et al. “InstantID: Zero-shot Identity-Preserving Generation in Seconds.” arXiv, 2024.
- Midjourney. “Character Reference Feature.” Midjourney Documentation, 2025.
- ACM SIGGRAPH. “AI-Assisted Character Design: A Professional Survey.” SIGGRAPH, 2025.