AI Agent - Mar 20, 2026

Leonardo Phoenix: How Model Training and Consistent Characters Are Setting a New Standard

The Consistency Problem in AI Art

AI image generation has a dirty secret: it’s excellent at producing single, standalone images, but it struggles with consistency. Ask any AI tool to generate the same character in ten different poses, and you’ll get ten slightly different characters. The face drifts. The proportions shift. The clothing changes. The distinctive details that make a character recognizable are lost between generations.

For casual use, this inconsistency is acceptable—each image stands alone, and minor variations don’t matter. For professional creative work, it’s a dealbreaker. A character designer needs the same character across a design sheet. A book illustrator needs the same protagonist across 30 chapters. A game studio needs the same hero across hundreds of in-game assets.

Leonardo Phoenix (leonardo.ai), Leonardo.ai’s proprietary generation model, addresses consistency not as an afterthought but as a core design principle. Combined with the platform’s LoRA training capabilities, it creates a system where generating consistent characters and maintaining visual coherence across large projects is both achievable and practical.

How Leonardo Phoenix Achieves Consistency

Architecture-Level Design

Leonardo Phoenix was trained with character consistency as an explicit objective. Unlike general-purpose diffusion models that optimize for diversity (producing varied outputs for the same prompt), Phoenix includes training signals that reward visual coherence when generating related images.

The practical effect: when you provide Phoenix with a character reference and ask for the same character in a different pose or setting, the model maintains more stable features than competing models. Facial structure, body proportions, and distinctive characteristics transfer more reliably between generations.

Character Reference Embeddings

Phoenix supports a character reference system that creates a compressed identity representation from uploaded reference images. This embedding captures:

Facial geometry (bone structure, eye shape, nose profile)
Distinctive features (scars, birthmarks, hair style, glasses)
Body proportions (height, build, limb ratios)
Skin tone and complexion

When generating new images, the character embedding is injected into the generation pipeline, guiding the diffusion process toward output that matches the reference identity.

LoRA Fine-Tuning for Deep Consistency

For the highest level of character consistency, Leonardo’s in-platform LoRA training goes further than character references. By training a dedicated LoRA on a specific character, users create a model modification that deeply encodes the character’s visual identity.

A character LoRA trained on 30-50 reference images produces substantially better consistency than a reference embedding alone. The LoRA captures not just what the character looks like, but how the character’s features behave under different lighting, angles, and expressions.

The training workflow:

Prepare 30-50 images of the character from different angles, with different expressions, in different lighting
Upload to Leonardo’s training interface
Configure training parameters (Leonardo provides recommended defaults for character training)
Train (typically 15-30 minutes of processing)
Generate using the trained LoRA with adjustable weight

Practical Applications

Character Design Sheets

A character designer producing a design sheet—front view, side view, back view, three-quarter view, with expression variations—can now generate these views with sufficient consistency to serve as a professional reference document. Previously, this required either manual drawing or extensive post-generation editing to harmonize the AI-generated views.

Book and Comic Illustration

For illustrated books and comics, protagonist consistency across dozens or hundreds of pages is essential. Leonardo’s LoRA-trained characters maintain sufficient consistency for:

Chapter header illustrations
Scene illustrations throughout a book
Comic panels across extended storylines
Character cards and reference sheets for series bibles

The consistency isn’t perfect—hand refinement is still needed for the most critical panels—but it reduces the editorial correction work from hours to minutes per illustration.

Game Asset Production

Game studios use character-trained LoRAs to generate:

Multiple expressions/emotions for dialogue systems
Character variations (armor sets, costume changes, seasonal outfits)
Promotional art featuring consistent character representation
In-game collectible cards and achievement badges
Social media content featuring game characters

Brand Character Development

Brands with mascots or representative characters use Leonardo to generate diverse content featuring their character—different situations, settings, and contexts—while maintaining the visual identity that audiences recognize.

Training Best Practices

Dataset Quality

The quality of the trained LoRA directly depends on the quality and diversity of the training dataset:

Quantity: 30-50 images minimum; 50-100 for best results
Diversity: Multiple angles, lighting conditions, expressions, and contexts
Consistency: All images should clearly depict the same character
Quality: High-resolution, well-lit images produce better training results
Background variety: Different backgrounds help the model separate character features from context

Common Training Mistakes

Too few images: Below 20 images, the LoRA may not capture enough variation to generalize well
Too little diversity: All images from the same angle/lighting leads to poor performance on different angles
Inconsistent subject: Including images of different characters confuses the training
Low resolution: Low-quality training images produce low-quality output
Over-training: Too many training epochs can cause the model to memorize training images rather than learning generalizable features

Recommended Settings

Leonardo provides default training configurations that work well for most character training tasks. Advanced users can adjust:

Learning rate (lower for more subtle adaptation)
Training epochs (more for complex characters, fewer for simple designs)
Regularization strength (to prevent overfitting)

Comparing Consistency Approaches

Leonardo Phoenix + LoRA vs. Midjourney Character Reference

Midjourney’s character reference feature works without training—upload a reference image and generate. It’s faster but less consistent. Across a large number of generations, Leonardo’s trained LoRA maintains tighter consistency than Midjourney’s zero-shot reference approach.

Leonardo Phoenix + LoRA vs. Civitai Community LoRAs

Civitai hosts community-trained character LoRAs, but training them requires local GPU hardware and technical expertise. Leonardo’s in-platform training removes the hardware requirement and simplifies the process. The quality is comparable—the difference is accessibility.

Leonardo Phoenix + LoRA vs. IP-Adapter/InstantID

Open-source consistency tools (IP-Adapter, InstantID) offer zero-shot or few-shot consistency through reference image conditioning. They’re effective but require ComfyUI or similar setup. Leonardo’s approach is more integrated and accessible, though the underlying technology is conceptually similar.

Limitations and Honest Assessment

What Works Well

Maintaining facial identity across different poses and expressions
Consistent body proportions and build
Stable clothing and accessory representation
Reliable skin tone and coloring

What Still Challenges

Very extreme angle changes (front view to directly behind)
Aging or de-aging the character
Dramatic style shifts (e.g., realistic character to anime style) while maintaining identity
Hands and finger details (an industry-wide problem, not specific to Leonardo)

The “Last Mile” Problem

Even with strong consistency, professional output typically requires a “last mile” of human refinement—correcting minor facial drift, fixing hand positions, ensuring costume details match exactly. Leonardo dramatically reduces the work required, but it doesn’t eliminate it entirely.

The Broader Impact

Leonardo’s approach to consistency represents a meaningful shift in how AI image generation can be used for professional creative work. By making character consistency achievable through a hosted platform with guided tools, Leonardo has lowered the barrier that previously kept many creative professionals from adopting AI generation.

The standard is now: consistent characters are expected from professional AI tools, not optional. Competitors who don’t offer comparable consistency features are at a disadvantage for professional use cases.

References

Leonardo.ai Official Website. https://leonardo.ai
Leonardo.ai. “Phoenix Model: Training and Architecture.” Leonardo Blog, 2025.
Hu, E. J., et al. “LoRA: Low-Rank Adaptation of Large Language Models.” ICLR, 2022.
Ruiz, N., et al. “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.” CVPR, 2023.
Ye, H., et al. “IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models.” arXiv, 2023.
Wang, Q., et al. “InstantID: Zero-shot Identity-Preserving Generation in Seconds.” arXiv, 2024.
Midjourney. “Character Reference Feature.” Midjourney Documentation, 2025.
ACM SIGGRAPH. “AI-Assisted Character Design: A Professional Survey.” SIGGRAPH, 2025.