Everyone Should Be Afraid of OpenAI's New AI Video Maker

We can all agree that AI has gone too far.

Everyone Should Be Afraid of OpenAI's New AI Video Maker

The CEO of OpenAI, Sam Altman, announced Sora yesterday. She is the company's new AI movie generator. Sora, like DALL-E and ChatGPT before it, can understand natural language orders from the user, and carry them out exactly as asked. Instead of responding with text or a picture, Sora makes a full, realistic movie that is better than any AI program I've ever seen. Actually, I'm not trying to be nice.

The first thing that Sora felt was fear

On Sora's announcement page, OpenAI has a bunch of movies that show off what it can do. They're beautiful but in the worst way. As an example, "a short fluffy monster kneeling beside a melting red candle" or "a cartoon kangaroo disco dances" are animation things that Sora can make. The final product isn't as good as, say, Pixar or DreamWorks, but most of them look professional, and some look much better than others. I don't think many people would have guessed at first look that there were no people involved.

But while the fact that it could be used for animation is scary, the actual movies are truly terrifying. OpenAI showed "drone footage" of an ancient church on the coast of Amalfi, a parade of people enjoying the Chinese New Year, and a tracking shot of a snowy street in Tokyo. If you saw these videos for the first time, you would think they were real. I mean, some of them still don't look like they were made by AI, even though I know they were.

Even the ones with AI problems, like assets being warped and moved around, could be mistaken for video compression. There's a movie of puppies playing in the snow. There are some flaws that will show you that it's not real, but the physics and beauty of the picture make it look real. How come none of these dogs are real? It's clear that they love snow. God, do I think I'm already in the Matrix?

How does Sora do its job?

In its technical study, OpenAI talks about Sora's main functions, but we don't have all the information yet. Let's start with Sora, who is a spread figure. Like AI image makers, Sora makes videos by starting with a lot of static noise and getting rid of it until the video looks like the picture you want.

It learns from pieces of material called patches: Images and movies are shrunk down into "lower-dimension latent space," which is then further broken down into "spacetime" patches, which are the units the model knows. The room and time for a certain movie are stored in these patches. Then, Sora makes videos in that "latent" space. The final product is made by mapping that back to "pixel" space by a decoder.

The business doesn't say for sure where these photos and videos come from, though. That's interesting. They do say that Sora is based on a study from their DALL-E and GPT models. To train the model on descriptive user questions, they used the same re-captioning method from DALL-E 3.

What has Sora got left to do?

Although it is clear that Sora can make movies from standard prompts, OpenAI says it can also make videos from still pictures. The Keyframer tool at Apple is being used by experts to work on the same kind of process.

It can also add more time to a current movie, either forward or backward. OpenAI used a movie of a San Francisco train to show how this works. It added an extra 15 seconds of video to the beginning in three different ways. The three videos look different at first, but they all work together to make one long video clip at the end. They can also make "perfect loops" with this method.

OpenAI thinks that Sora is great for making places. That's great! People and things stay in place and connect the way they should in videos that are made with it. Sora can remember what people and things do that change the "world," like someone drawing on paper, so it doesn't lose track of them when they leave the frame. It can also, um, make Minecraft on the spot by modeling the player and making the world around them at the same time.

Is Sora perfect? No

OpenAI is smart enough to know that Sora has some problems right now. The company says that the model might not be able to accurately represent physics in a "complex scene" or in some cases of cause and effect. One example given by OpenAI is a video of someone eating a cookie. However, when you watch the video again, the cookie doesn't have any bite marks on it. It looks like breaking glass is another problem to render.

The company also says that Sora could mess up "spatial details" in your prompt, like mixing up left and right, and that it might not be able to show how things happen over time correctly.

These problems can be seen in movies that OpenAI shows as proof that Sora made "mistakes." Sora makes a man run the wrong way on a treadmill for a prompt that asks for a person to run. For another prompt that asks for "archeologists" to find a plastic chair in the desert, the "archeologists" pull a sheet out of the sand and the chair appears out of thin air. This one is especially weird to watch.

Soon, the future will be here

As you look through Sora's welcome page, you might feel a little scared. But these are the best movies that Sora can make right now, chosen to show off its skills, excluding the ones that OpenAI points out as mistakes.

After the news came out, Sam Altman went on Twitter and asked people to send him comments to send to Sora. Out of the eight choices he tweeted, I don't think any of them would have made it to the news page. It looked like the first draft of a 2000s direct-to-DVD cartoon. The first try at "A half duck, half dragon flies through a beautiful sunset with a hamster dressed in adventure gear on its back" was so bad it was funny.

If you searched for "two golden retrievers podcasting on top of a mountain," you got something strange back: It looks like all the objects were clipped from stock video and then quickly put together. It doesn't look "real," more like it was Photoshopped, which again makes me wonder what Sora is trained on:

These short demos did make me feel a little better, but not much. Sora isn't quite at the point where it can make movies that look so real that they can't be told apart from real life. Most likely, OpenAI looked at tens of thousands of results before choosing the ones that were most important for its statement.

But that doesn't mean Sora isn't terrifying. It won't take much research or time for it to improve. I mean, this is where AI video generation was 10 months ago. I wonder what Sora would spit out.

OpenAI is sure it is taking the right steps in this case: It is working with red teamers on harm reduction research right now and wants to add a label to Sora-generated content like other AI programs do so that you can always tell when OpenAI's technology was used to make something.

But really, these movies are too good. There are things that can fool you at first but look fake now that we're past them. It's hard to believe that some of these movies aren't real. If this stuff can impress people like me who work with AI all day, how is the normal person on Facebook supposed to know that a video that looks real was made by robots?

As not to get too dark, this year there are important elections in more than 50 countries. In the US, AI has already been used to try to trick people, and that was just with sound. This year I think we'll see some of the most convincing multimedia scams and disinformation efforts ever, so make sure your lie detectors are at full strength.

People, you should really hope that these watermarks work. It's going to be very exciting.

Post a Comment

0 Comments