First Impressions of OpenAI's Sora 2 + App
OpenAI just released a new version of Sora, their video generation model. I spammed reload on a thread on Reddit to get an invite code last night, and quickly hit the rate limit (it was 71 videos)! Here’s my first impressions on the new app and model.
The main features of Sora 2 (and the new app) are that you can make short form (like 10 second) videos just by prompting, and you can also insert yourself into videos. This is pretty interesting because previously you’d have to either train a LoRA (like I did) or maybe use a static image of yourself as a starting frame which would inevitably drift away from looking like you.
Onboarding
The onboarding of the app is pretty interesting. To make a “cameo,” which is the term for inserting your likeness into a video, you need to record a short video of yourself saying some numbers and looking around.
At first I thought the numbers were just to verify that you were recording a live video, but I think it’s also to capture your voice for voice cloning. The voice that the AI uses for me sort of sounds like me but not quite. It’s probably because I was mumbling the numbers.
Once you upload the video, you can just tag yourself in videos that you want to generate. You can also determine who can make videos with your likeness through some permissions. Like if you only want friends to be able to make cameos of you.
Generation
I was interested in seeing how good the video generation was, and how creative you can be with the app. I think there must be some prompt rewriting magic being done after your initial prompt, because you can use a really vague prompt and still get an interesting video with an actual plot. I just prompted “username farting” and it came up with this gem:
I also noticed that the notifications that the app sends out will describe the video in a way that might include more info than in your initial prompt.
OpenAI is obviously going to be pretty risk-averse here when it comes to content moderation. When you enter a prompt that hits its guardrails, you’ll get a message that your generation failed. The annoying thing is that you can’t really tell if a prompt will run into a guardrail until you try generating the video. I’ve used identical prompts where one generation works fine and another hits a content block.
For example, I tried recreating this AI boulder video that people are talking about where some lady busts a Chinese glass bridge and people fall into the water. I was able to prompt it to get something similar. One time it failed and I just retried with the exact same prompt and it worked. There must be a few stages where the content is checked, at least one before generation and maybe one that watches the video once it’s finished. I tried remixing this video with Sam Altman instead of me and it kept failing, even though the only change was the person with the boulder.
This brings up another point I want to make about Sora and OpenAI in general. I’ve been noticing this for a while, but it seems kinda funny and weird to me that Sam Altman is increasingly the face and brand of OpenAI. I remember seeing Sam in his multiple colorful polo shirts as a bro, showing off Loopt at the Apple Keynote in 2008. I also remember meeting him once when we were doing a company offsite when I worked at FarmLogs (he was an investor and I think in charge of YCombinator at the time, and FL was a YC company). In addition to being an investor/tech startup builder guy, I guess he also wants everyone to know who he is.
Sama (as he’s called) is available to cameo in pretty much any video you want to make. There’s some viral ones of him shoplifting GPUs from Target for Sora inference. And also a bunch of scammy ones where people make him beg for likes so he’ll give you an invite code. I can’t imagine being so full of myself that I’d want to sign up for this kind of publicity. But I guess Sama is just built different. He does seem like a bit of a psychopath but whatever, now he’s the de facto mascot of OpenAI!
Inconsistent Moderation?
So I keep running into issues where I can’t moonwalk or make myself a chef who’s being absolutely destroyed by Guy Fieri’s criticism on Diners Drive-ins and Dives, but for some reason there’s like a million videos of Pikachu on Sora. There’s also lots of videos of GTA V and other copyrighted stuff.
It’s been interesting to see how people get around these guardrails, though. For example, I can’t say that I want a video of myself as Kramer in Seinfeld. But I can say that I’m busting through Jerry’s apartment door in a 90s sitcom and saying “Giddyup, Jerry!” which will work. I did notice though that the actors portraying Jerry only bear a passing resemblance to Jerry Seinfeld (kind of like the episode inside an episode of Seinfeld).
Other Interesting Stuff
I recorded my cameo proof video in front of a brick fireplace wearing a striped shirt. I noticed that in most of my videos I’m wearing that striped shirt, and if I don’t add a lot of detail to my prompt, I’ll end up getting a video in that exact same spot in my house. I guess that isn’t surprising that the system will associate me with the location of the recording but it’s also kind of weird to see my home show up in videos.
If you watch enough videos on your feed, you’ll probably also notice that many of the Sam Altman videos are just him in front of a white background. I believe that this probably matches his own cameo setup video environment.
The Feed
I guess I should also talk about the social aspect of Sora. One of the weirdest things about this product launch is that it’s a social app. You can add “friends” who can make videos of you, and you can even have multiple cameos in a single video. I haven’t added any “friends” yet but eventually I might get there.
There’s also a “For You” feed which is currently inundated with socially engineered clickbait. Like videos claiming that if you double tap (which makes you like the post) you’ll see some special emoji. I’m honestly surprised that OpenAI didn’t consider that people will try to game the system this way. It’s a bad user experience because these videos are boring and often contain the same exact script, just with different people talking. True AI slop.
I did manage to find some interesting videos, where people have the same instinct to make fun of Sam Altman and his double polo shirts. One of the first videos I made was of him announcing Loopt 2 wearing 5 polo shirts with giant collars! So there are some gems but they’re truly hidden behind the slop.
I guess if you want you can follow my account, it’s hungtruongy because somehow I didn’t get “hungtruong” in time.
Conclusion
Sora 2 is really a huge step forward in AI video generation. It’s incredibly impressive. Which is why it’s kind of tragic that this technology is currently just being used to make shitty slop. I’m hopeful that through moderation and algorithming, it could actually end up being an interesting platform.
I’ve also read that the true value of this technology is through ads. Like imagine how much brands would love to offer making a cameo of you wearing their clothing so you’ll buy it. Which makes the announcement a few days ago about an AI agent purchase protocol a lot more obvious in hindsight.
I had a lot of fun making videos of myself (and hitting the rate limits). I’m sure other people will too. I’m guessing some creative people will find a really cool use for this, or it’ll turn into an AI slop graveyard. Either way, it should be interesting!
Leave a Comment
Comments are moderated and won't appear immediately after submission.