OpenAI’s new model ‘Sora’ creates ultra-realistic videos from text

The future of AI

Published 17 February 2024
- By Editorial Staff
The image above is entirely generated from an arbitrary text description.

On Thursday, OpenAI presented Sora, a new model that can generate high-definition videos of up to one minute based on text descriptions. However, Sora, which means “sky” in Japanese, will not be available to the public in the near future. Instead, it is being released to a small group of researchers and academics to evaluate the risks of misuse.

“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background”, OpenAI writes on its website. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

One of the examples of videos generated by Sora shows a couple walking through a snowy Tokyo, while cherry blossoms and snowflakes swirl around them.

OpenAI claims that the model works thanks to a “deep understanding of language”, which allows it to correctly interpret text descriptions. Yet, like almost all AI-based image and video generators, Sora is not perfect. In one example, people and streets are completely missing from the video even though the description mentions a Dalmatian dog looking out of a window and people “walking and cycling along the canals”.

OpenAI also warns that the model may have difficulty understanding causal relationships – for example, it may generate a video of a person eating a cookie, but the cookie does not receive any bite marks.

Sora is not the first text-to-video model on the market. Other companies, including Meta, Google and Runway, have either hinted at plans for or launched similar tools. However, no other tool can yet generate 60-second videos. Sora also generates entire videos at once, instead of assembling them frame by frame like other models. This ensures that subjects in the video remain themselves even when they temporarily disappear from view.

Causing concern and distaste

The emergence of text-to-video tools has raised concerns that they can be used to more easily create realistically fake videos. Generative AI has also been criticised by artists and creative professionals who worry about the technology being used to replace jobs and use copyrighted material.

OpenAI says it is working with experts in areas such as “disinformation, hateful content and bias” to test the tool before its public release. The company is also developing tools to detect videos generated by Sora and include metadata in the generated videos for easier detection.

OpenAI declined to reveal to The Times how Sora had been trained, except to say that it used both “publicly available videos” and videos licensed from copyright holders.

TNT is truly independent!

We don’t have a billionaire owner, and our unique reader-funded model keeps us free from political or corporate influence. This means we can fearlessly report the facts and shine a light on the misdeeds of those in power.

Consider a donation to keep our independent journalism running…