Diffusion models generate incredible images by learning to reverse the process that, among other things, causes ink to spread through water.
Ask DALL·E 2, an image generation system created by OpenAI, to paint a picture of “goldfish slurping Coca-Cola on a beach,” and it will spit out surreal images of exactly that. The program would have encountered images of beaches, goldfish and Coca-Cola during training, but it’s highly unlikely it would have seen one in which all three came together. Yet DALL·E 2 can assemble the concepts into something that might have made Dalí proud.
DALL·E 2 is a type of generative model — a system that attempts to use training data to generate something new that’s comparable to the data in terms of quality and variety. This is one of the hardest problems in machine learning, and getting to this point has been a difficult journey.
The first important generative models for images used an approach to artificial intelligence called a neural network — a program composed of many layers of computational units called artificial neurons. But even as the quality of their images got better, the models proved unreliable and hard to train. Meanwhile, a powerful generative model — created by a postdoctoral researcher with a passion for physics — lay dormant, until two graduate students made technical breakthroughs that brought the beast to life.