CFG :: definition and meaning
CFG Scale (Classifier-Free Guidance Scale) acts as a dial that controls how closely Stable Diffusion follows your written instructions (your prompt) when generating an image.
A higher CFG Scale means the AI sticks tightly to your words, while a lower value gives it more freedom to get creative and interpret your prompt loosely.
ELI5 (Explain Like I'm 5)
Imagine you're telling a friend how to draw a monster.
If you want them to follow your instructions exactly, like saying "two big eyes, a pointy nose, and a wide scary mouth," you'd use a high CFG Scale.
But, if you just say "a scary monster" and let them fill in the rest, that's like using a lower CFG Scale: it gives them more room to make their own monster.
Advanced
CFG Scale works by manipulating the way Stable Diffusion interprets the noise pattern it starts with. A higher CFG Scale reduces the influence of this noise during the generation process. This forces the AI to focus more heavily on the specific details in your prompt. Lower CFG Scale values allow the noise pattern to play a bigger role, leading to more unexpected and potentially surprising results.
Why AnimateDiff requires high CFG?
White it varies, the best results are obtained with values between 12 and 18.
- Maintaining Consistency Across Frames:
AnimateDiff relies on generating multiple images (frames) sequentially to create the animation.
A high CFG helps ensure consistency between frames by prioritizing the prompt details throughout the generation process.
This reduces the influence of randomness and encourages the AI to maintain the same visual style, objects, and overall theme across the animation. - Controlling the Flow of Motion:
The animation relies on subtle changes in the prompt between frames to create the illusion of movement.
A high CFG allows for finer control over these prompt variations, ensuring the changes translate into the desired motion within the animation.
This mitigates unwanted deviations or distortions that might arise from excessive creative freedom granted by a lower CFG. - Combating the Accumulation of Errors:
Each frame in the animation builds upon the previous one. With a lower CFG, even minor deviations in earlier frames can snowball into larger inconsistencies as the animation progresses.
A high CFG helps prevent this by ensuring each frame adheres closely to the prompt, minimizing the accumulation of errors and maintaining the overall coherence of the animation.
Delving deeper
During the denoising steps of image generation, CFG Scale acts as a weighting factor that influences the adjustment of the latent space towards the unconditional embedding.
The unconditional embedding represents the inherent distribution of the model, independent of any specific prompt.
Latent_Space_Update = α * Prompt_Embedding + (1 - α) * Unconditional_Embeddingα (alpha): Represents the CFG Scale value (typically ranging between 0 and 1).
Prompt_Embedding: Captures the information encoded from the text prompt.
Unconditional_Embedding: Represents the inherent tendencies of the model.
With a higher CFG (higher α), the prompt embedding exerts a stronger influence, guiding the latent space closer to the desired representation based on the text description.
Conversely, a lower CFG (lower α) allows the unconditional embedding to play a more significant role, incorporating the model's inherent characteristics and potentially introducing more creative interpretations.
Why high CFG gives saturated colors?
By suppressing the influence of the initial noise pattern, high CFG restricts the exploration of different color combinations.
This can lead to the AI settling on a more limited color range, potentially making the chosen colors appear more saturated due to the lack of variation.
CFG and steps
A lower CFG allows the AI to explore various paths (color combinations, object arrangements) within the latent space based on the noise pattern.
In contrast, a high CFG restricts this exploration, forcing the AI to navigate a more specific, potentially narrower path towards the desired outcome.
This narrower path often requires more steps to reach the final destination.
Imagine painting a portrait: with a high CFG, the AI needs to overcome any inherent randomness in the noise pattern to achieve the specific details in your prompt.
This "corrective" process requires more steps compared to a lower CFG, where the AI can leverage some randomness to fill in the gaps in your instructions