Kandinsky is a neural network for generating images developed by the team of developers at Sberbank. The model works in 101 languages, but the most important thing is that it works in Russian. Its interface will be intuitive for any user from Russia and the CIS, unlike Midjorney.
As text and image encoder it uses CLIP model and diffusion image prior (mapping) between latent spaces of CLIP modalities. This approach increases the visual performance of the model and unveils new horizons in blending images and text-guided image manipulation.
kandinsky-2.2 was twRAIned on a large-scale image-text dataset LAION HighRes and fine-tuned on our internal datasets.