Meet Imagen, Google’s new AI that turns text into images

A sample image created from the text: ‘A blue jay standing on a large basket of rainbow macarons’.

A little over a year ago, OpenAI unveiled Dall-E, an artificial intelligence model capable of creating an image from text.

Now, the Google Research Lab has unveiled “Imagen,” a new model that promises to be even more powerful and efficient, according to the American company.

Google describes this innovation as “a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.”

Behind this complex explanation lies a much simpler reality: a short text description is all that’s needed to create an infinite number of images of extremely high and realistic quality.

“Imagen” is capable of combining concepts and attributes to create all kinds of images you might think up.

The different demonstrations available on the “Imagen” website include images of a cobra made of corn or a small house made of sushi.

This kind of software could easily find use in many digital companies, delivering fast, effective and even personalized communications campaigns.

For artists, the creative possibilities could complement their work in a multitude of ways.

The possibilities offered by these models are almost infinite. And yet, a mainstream tool is not on the agenda because of one particularly thorny problem: algorithmic bias.

In its most basic definition, algorithmic bias is the fact that the results delivered by a learning algorithm are not fair, i.e. the model uses data produced by humans in very large quantities and is therefore not neutral.

To build and run these models that process a huge amount of data, engineers use deep learning algorithms to train them as much as possible.

Inherent stereotypes and prejudice

The goal is to be able to answer a user’s request with the highest possible precision. In order to achieve such a feat, data must be processed massively, and in all forms.

Banks of data drawn from the internet are strongly present in the development of artificial intelligence.

The latter feeds on everything that can be found on the web to hone its “intelligence,” including stereotypes, prejudices or discriminations.

When presenting its new product, Google once again alerts people to this reality that hinders the company from deploying its model.

“There are several ethical challenges facing text-to-image research broadly,” explains Google.

“Downstream applications of text-to-image models are varied and may impact society in complex ways. The potential risks of misuse raise concerns regarding responsible open-sourcing of code and demos.”

For the moment, and as for Dall-E, the American company has decided not to publish the source code or to perform a public demonstration.

“Preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes.”

The company hopes to make more progress on these remaining challenges in order to be able to open up its model to users while tackling potential bias.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s