Producing video content is a particular challenge for generative AI models, which have no real concept of space or physics, and are essentially dreaming up clips frame by frame. It can lead to obvious errors and inconsistencies, as we wrote about in December with OpenAI’s Sora, after it served up a video with a disappearing taxi.
It’s these specific problems that AI video company Runway says it’s made some progress in fixing with its new Gen-4 models. The new models offer “a new generation of consistent and controllable media” according to Runway, with characters, objects, and scenes now much more likely to look the same over an entire project.
If you’ve experimented with AI video, you’ll know that many clips are brief and show slow movement, and don’t feature elements that go out of the frame and come back in—usually because the AI will render them in a different way. People merge into buildings, limbs transform into animals, and entire scenes mutate as the seconds pass.
This is because, as you might have gathered by now, these AIs are essentially probability machines. They know, more or less, what a futuristic cityscape should look like, based on scraping lots of futuristic cityscapes—but they don’t understand the building blocks of the real world, and can’t keep a fixed idea of a world in their memories. Instead, they keep reimagining it.
Runway is aiming to fix this with reference images that it can keep going back to while it invents everything else in the frame: People should look the same from frame to frame, and there should be fewer issues with principal characters walking through furniture and transforming into walls.
The new Gen-4 models can also “understand the world” and “simulate real-world physics” better than ever before, Runway says. The benefit of going out into the world with an actual video camera is that you can shoot a bridge from one side, then cross over and shoot the same bridge from the other side. With AI, you tend to get a different approximation of a bridge each time—something Runway wants to tackle.
Have a look at the demo videos put together by Runway and you’ll see they do a pretty good job in terms of consistency (though, of course, these are hand-picked from a wide pool). The characters in this clip look more or less the same from shot to shot, albeit with some variations in facial hair, clothing, and apparent age.
What do you think so far?