Z-Image Turbo
A 6-billion-parameter image generation model designed for efficient few-step sampling
What is Z-Image Turbo?
Z-Image Turbo is part of the Z-Image project, focusing on efficient large-scale image generation. The model demonstrates that a 6-billion-parameter foundation can deliver strong performance without relying on very large model sizes or long sampling schedules.
The model uses a single-stream diffusion transformer architecture where text tokens, semantic tokens, and image tokens share one transformer. This approach keeps the design compact and makes efficient use of parameters.
Design Goal: Practical deployment on consumer graphics cards with less than 16 GB of VRAM while producing images competitive with much larger systems.
Key Features
Few-Step Generation
Operates in roughly eight diffusion updates while maintaining stable image quality. Short wait times between prompt and results.
Bilingual Text Rendering
Trained to render both English and Chinese text into images with consistent structure, including mixed language content.
Practical Hardware
Fits on graphics cards with less than 16 GB of VRAM using mixed precision and memory optimization features.
Open Source
Model weights and code available under Apache 2.0 license. Full control over your infrastructure.
Single-Stream Transformer
One transformer processes text, semantic, and image tokens together for improved parameter efficiency.
Decoupled-DMD
Advanced distillation approach that separates classifier-free guidance from distribution matching for better stability.
The Z-Image Project
Z-Image (Base)
The 6B-parameter foundation model for image generation. Achieves strong image quality through careful data curation and single-stream transformer design.
Z-Image Turbo
Distilled model for fast text-to-image generation. Reduces sampling steps to around eight while maintaining photorealistic quality and bilingual text rendering.
Z-Image Edit
Continued-training variant for image editing. Accepts input images and structured instructions to produce edited results consistent with original content.
Decoupled-DMD
The distillation approach behind Z-Image Turbo. Separates classifier-free guidance augmentation from distribution matching for easier tuning.
DMDR
Training method combining distribution matching with reinforcement learning feedback to improve semantic alignment and fine detail.
Installation
Installing Z-Image Turbo locally follows the standard diffusion library workflow. The main steps involve installing the library, preparing PyTorch with GPU support, and loading the pipeline.
Step 1: Install Dependencies
# Install the latest diffusers from source
pip install git+https://github.com/huggingface/diffusers
# Install PyTorch with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu124
# Install additional requirements
pip install transformers accelerate safetensors
Step 2: Basic Usage Example
import torch
from diffusers import ZImagePipeline
# Load the pipeline
pipe = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=False,
)
pipe.to("cuda")
# Generate an image
prompt = "Young woman in red traditional clothing, clear facial details, soft evening lighting, city background, bilingual signboard text."
image = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=9, # about 8 DiT forwards
guidance_scale=0.0,
generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("z_image_turbo_example.png")
Try Z-Image Turbo
Experiment with prompt design, text rendering, and composition. The model responds to different descriptions and supports bilingual text prompts.
What You Can Build
Prompt-Based Illustration Tools
Create applications where users type descriptions and receive images quickly. The few-step nature supports interactive workflows for sketching ideas and exploring variations.
Design Systems with Text Rendering
Design posters, covers, and layouts combining English and Chinese content. The model follows structured prompts describing placement and content for text areas.
Research on Distillation
Study the behavior of Decoupled-DMD and DMDR. Z-Image Turbo offers a practical reference for combining distillation with preference-based objectives.
Editing Pipelines
Combined with Z-Image Edit, cover both prompt-based generation and modification of existing content for complete creative workflows.
Frequently Asked Questions
What GPU do I need to run Z-Image Turbo?
Z-Image Turbo runs on consumer graphics cards with less than 16 GB of VRAM. Using mixed precision and memory optimization, it fits on most modern GPUs.
How does Z-Image Turbo relate to the Z-Image project?
Z-Image Turbo is the distilled variant of the base Z-Image model, optimized for fast text-to-image generation with fewer sampling steps.
Can I fine-tune Z-Image Turbo on my own dataset?
Yes, the model is open source under Apache 2.0 license. You can adapt it to new datasets or integrate it into your own tools and services.
What makes the bilingual text rendering useful?
The model renders both English and Chinese text with consistent structure, useful for posters, titles, diagrams, and images combining graphics with written content.