In this “AI 101” series of blog posts, the team at Lewis Silkin unravel the legal issues involved in the development and use of AI text and image generation tools. This initial post explores how these tools operate and considers the legal cases being brought against major AI platforms in the UK and US. Later posts in the series will consider who owns the output of AI image generators, the infringement risks involved in using output created by AI; the ethical implications of programming AI; and what regulation is on the horizon – so keep an eye out for those!

Over the past year there has been an explosion in the use of AI models that generate text, art and music. These new platforms showcase an astounding (and at times, slightly disconcerting) level of AI technology – accessible to the masses – which will revolutionise the way we work, write and create as a society in the modern age.

However, this development and uptake of AI has taken place against a backdrop of legal uncertainty surrounding the extent to which the creation and use of these AI models is consistent with the rights of those whose creative works have been used to ‘train’ the models. Litigation has now been commenced in the US and in the UK which may finally provide some clarity on the legality of AI image generation tools. But the question to consider at the outset is: how exactly do these tools work and how does the law apply to their creation?

How do AI text generators (like ChatGPT) work?

By now, you will probably have heard of ChatGPT – the AI chatbot launched by OpenAI back in November 2022 that has set the internet abuzz, owing to its ability to automatically generate text answers similar to that of a (very real, knowledgeable and quick-thinking) human being in response to prompts inputted by the user.

Whether you want ChatGPT to write you a degree-level history essay on the Russian Revolution; explain the Theory of Relativity to you as if you were an 8 year-old; draft a script for an important marketing pitch you have coming up; create a healthy meal plan for the next 7 days using ingredients you have in your fridge; or write a limerick about the dangers of robot intelligence – Chat GPT will get back to you in a matter of seconds. Its unique ability to factor-in the context of a particular prompt and respond conversationally is perhaps what sets the tool apart from its AI ancestors. Users can ask the AI to tailor a response to make it “funnier”, “more detailed” or to appeal to a specific audience, with impressive results.

At a high-level, text and language AI tools like ChatGPT (other chatbots are available) are programmed using a Large Language Model or “LLM”, trained on a massive volume of sample text in different languages and domains. The LLM behind ChatGPT (GTP-3) has been trained on billions of English-language data points inputted directly and scraped from the internet, in addition to specific computer coding languages such as JSX and Python. Once the AI has been fed with this data, the tool undergoes a period of ‘supervised learning’ using question/answers pairs inputted by human AI trainers, forcing the model to fit input and output data to improve the network. After further periods of monitoring, fine-tuning and rewarding the AI for correct answers – using comparison data and ranking answers based on their quality – the AI is gradually optimised using a “Reinforcement Learning” algorithm.

The result? An AI only in its infancy that that can pass the Wharton Business School exam with a Grade B today (and which may well pass the Turing Test at some point in the near future, if it hasn’t already).

How do AI image generators work?

The current generation of AI image generation tools such as Stable Diffusion, Midjourney and DALL·E 2 are designed to take a text description or prompt from a user and generate an image that matches the prompt. For example, if we ask the DALL-E 2 AI generator for a “photograph of a cat wearing a suit” it generates the four images shown below:

Four AI generated photographs of cats wearing suits

As a first step in learning how to create images in this way, these tools draw upon huge datasets of pairs of text and images. This process allows the AI to learn to create connections between the structure, composition and data within an image and with the accompanying text. For example, the Stable Diffusion AI has utilised a dataset called LAION-5B, which is a set of 5.85 billion links to images stored on websites, paired with short descriptions of each image. By analysing images from the dataset which are described as ‘cat’, the AI is able to successfully identify when an image does or does not contain a cat. An example of some of the ‘cat’ images and their descriptions from the LAION-5B dataset is shown below:

An extract of entries in the LAION-5B dataset showing images tagged 'cat'.

The AI model is then trained to be able to generate images from random noise or static, through a process called diffusion. In this second training process, the AI is given an image to which small amounts of random noise are progressively added, corrupting the image and destroying its structure until it appears to be entirely random static. As this process occurs the AI attempts to understand how the addition of noise changes the image. An example of an image undertaking this training process is shown below:

An illustration of the diffusion training process, where increasing levels of static are added to a photograph of a cat.

The AI model is then trained on a reverse process, which is designed to restore or create structure and recognisable images from random static – as shown below:

A screenshot of the reverse process, where the AI model attempts to restore a recognisable image from random static.

Finally, once it has been trained the AI model can combine these two techniques to generate entirely new images, like the cats in suits shown above. To generate an image the AI model is given random static along with the user’s text description of the image they are seeking. The model then undertakes a diffusion process (analogous to the step of reversing image corruption shown above) and attempts to create an image matching the user’s prompt.

The legal backlash

Recently disputes have been commenced in the UK and in the US against companies which have developed AI image generators. In the UK, stock image company Getty Images has begun a lawsuit  against Stability AI, the company responsible for the release of the Stable Diffusion image generator. In the US a separate class action lawsuit has been filed against Stability AI and the company responsible for the Midjourney AI art tool.

In a press release announcing the UK litigation, Getty has claimed that Stability AI “unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images”. While the litigation has only just commenced and the parties have not yet set out their arguments, if the matter proceeds to trial it is likely to Stability AI will argue that its use of any Getty images in the training process was transient or incidental and therefore does not infringe copyright. Stability AI is likely to argue that their AI model does not contain copies of Getty Images, and that any use of the images to train the model was analogous to a human browsing the internet or using a search engine, which the UK Supreme Court has indicated does not, in the normal course of affairs, infringe copyright. The UK courts have not previously been asked to apply this copyright exception to the AI training process and any decision has the potential to provide useful clarity on the legality of developing and training generative AI.

Even if Stability AI avoids copyright liability, they may be bound by the terms of use of the Getty Images website, which expressly forbid “any data mining, robots or similar data gathering or extraction methods”. If Stability AI trained stable diffusion on images from the Getty website, they may be in breach of these terms of use and face the prospect of an injunction or damages award.

This litigation and the demands of the growing AI sector may also prompt the UK government to consider whether the law should be amended to facilitate training AI programmes. In June 2022 the government announced a plan to introduce a new copyright and database exception permitting text and data mining for any purpose. Text and data mining involves the computerised analysis of large amounts of information to identify patterns, trends and other useful information, and is a technique often used in training AI systems. The UK government proposals were criticised by representatives of the creative industries, and in November 2022 the plans were withdrawn by the government to allow further reflection. It will therefore be interesting to see how this litigation impacts the government’s priorities in this area.