💿Informations
FonAI comprises a frozen T5-XXL encoder to map input text into a sequence of embeddings and super-resolution diffusion models for generating images. All diffusion models are conditioned on the text embedding sequence and use classifier-free guidance. FonAI relies on new sampling techniques to allow the usage of large guidance weights without sample quality degradation observed in prior work, resulting in images with higher fidelity. While conceptually simple and easy to train, FonAI yields surprisingly strong results and outperforms other methods.

Last updated