Z-Image Turbo

A 6-billion-parameter image generation model designed for efficient few-step sampling

6B Parameters
8 NFE Steps
16GB GPU Friendly

What is Z-Image Turbo?

Z-Image Turbo is part of the Z-Image project, focusing on efficient large-scale image generation. The model demonstrates that a 6-billion-parameter foundation can deliver strong performance without relying on very large model sizes or long sampling schedules.

The model uses a single-stream diffusion transformer architecture where text tokens, semantic tokens, and image tokens share one transformer. This approach keeps the design compact and makes efficient use of parameters.

Design Goal: Practical deployment on consumer graphics cards with less than 16 GB of VRAM while producing images competitive with much larger systems.

Key Features

🚀

Few-Step Generation

Operates in roughly eight diffusion updates while maintaining stable image quality. Short wait times between prompt and results.

🌐

Bilingual Text Rendering

Trained to render both English and Chinese text into images with consistent structure, including mixed language content.

💻

Practical Hardware

Fits on graphics cards with less than 16 GB of VRAM using mixed precision and memory optimization features.

🔓

Open Source

Model weights and code available under Apache 2.0 license. Full control over your infrastructure.

🎨

Single-Stream Transformer

One transformer processes text, semantic, and image tokens together for improved parameter efficiency.

Decoupled-DMD

Advanced distillation approach that separates classifier-free guidance from distribution matching for better stability.

The Z-Image Project

Z-Image (Base)

The 6B-parameter foundation model for image generation. Achieves strong image quality through careful data curation and single-stream transformer design.

Z-Image Turbo

Distilled model for fast text-to-image generation. Reduces sampling steps to around eight while maintaining photorealistic quality and bilingual text rendering.

Z-Image Edit

Continued-training variant for image editing. Accepts input images and structured instructions to produce edited results consistent with original content.

Decoupled-DMD

The distillation approach behind Z-Image Turbo. Separates classifier-free guidance augmentation from distribution matching for easier tuning.

DMDR

Training method combining distribution matching with reinforcement learning feedback to improve semantic alignment and fine detail.

Installation

Installing Z-Image Turbo locally follows the standard diffusion library workflow. The main steps involve installing the library, preparing PyTorch with GPU support, and loading the pipeline.

Step 1: Install Dependencies

# Install the latest diffusers from source
pip install git+https://github.com/huggingface/diffusers

# Install PyTorch with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu124

# Install additional requirements
pip install transformers accelerate safetensors

Step 2: Basic Usage Example

import torch
from diffusers import ZImagePipeline

# Load the pipeline
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

# Generate an image
prompt = "Young woman in red traditional clothing, clear facial details, soft evening lighting, city background, bilingual signboard text."

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,  # about 8 DiT forwards
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("z_image_turbo_example.png")

Try Z-Image Turbo

Experiment with prompt design, text rendering, and composition. The model responds to different descriptions and supports bilingual text prompts.

What You Can Build

Prompt-Based Illustration Tools

Create applications where users type descriptions and receive images quickly. The few-step nature supports interactive workflows for sketching ideas and exploring variations.

Design Systems with Text Rendering

Design posters, covers, and layouts combining English and Chinese content. The model follows structured prompts describing placement and content for text areas.

Research on Distillation

Study the behavior of Decoupled-DMD and DMDR. Z-Image Turbo offers a practical reference for combining distillation with preference-based objectives.

Editing Pipelines

Combined with Z-Image Edit, cover both prompt-based generation and modification of existing content for complete creative workflows.

Frequently Asked Questions

What GPU do I need to run Z-Image Turbo?

Z-Image Turbo runs on consumer graphics cards with less than 16 GB of VRAM. Using mixed precision and memory optimization, it fits on most modern GPUs.

How does Z-Image Turbo relate to the Z-Image project?

Z-Image Turbo is the distilled variant of the base Z-Image model, optimized for fast text-to-image generation with fewer sampling steps.

Can I fine-tune Z-Image Turbo on my own dataset?

Yes, the model is open source under Apache 2.0 license. You can adapt it to new datasets or integrate it into your own tools and services.

What makes the bilingual text rendering useful?

The model renders both English and Chinese text with consistent structure, useful for posters, titles, diagrams, and images combining graphics with written content.