Even at first look, there’s one thing off in regards to the physique on the road. The white sheet it’s below is a bit too clear, and the officers’ actions are completely devoid of function. “We have to clear the road,” one in every of them says with a agency hand gesture, although her lips don’t transfer. It’s AI, alright. However right here’s the kicker: my immediate didn’t embody any dialogue.
Veo 3, Google’s new AI video era mannequin, added that line all by itself. Over the previous 24 hours I’ve created a dozen clips depicting information studies, disasters, and goofy cartoon cats with convincing audio — a few of which the mannequin invented all by itself. It’s greater than a bit creepy and far more refined than I had imagined. And whereas I don’t suppose it’s going to propel us to a misinformation doomsday simply but, Veo 3 strikes me as an absolute AI slop machine.
Google introduced Veo 3 at I/O this week, highlighting its most vital new functionality: producing sound to go along with your AI video. “We’re getting into a brand new period of creation,” Google’s VP of Gemini, Josh Woodward, defined within the keynote, calling it “extremely sensible.” I wasn’t fully bought, however then, just a few days later, I had Veo 3 generate a video of a information anchor asserting a hearth on the House Needle. All it took was a primary textual content immediate, a couple of minutes, and an costly subscription to Google’s AI Ultra plan. And you recognize what? Woodward wasn’t exaggerating. It’s sensible as hell.
I attempted the information anchor immediate after seeing what Alejandra Caraballo, a medical teacher at Harvard Regulation College’s Cyberlaw Clinic, was capable of produce. One of her clips encompasses a information anchor asserting the loss of life of US Secretary of Protection Pete Hegseth. He’s not lifeless, however the clip is extremely convincing. A submit together with a string of movies with AI-generated characters protesting the prompts used to create them has 50,000 upvotes on Reddit. The scenes embody disasters, a lady in a hospital mattress utilizing a respiration tube, and a personality being threatened at gunpoint — all with spoken dialogue and sensible background sounds. Actual lighthearted stuff!
Perhaps I’m being naive, however after taking part in round with Veo 3 I’m not fairly as involved as I used to be at first. For starters, the plain guardrails are in place. You possibly can’t immediate it to create a video of Biden tripping and falling. You possibly can’t have a information anchor announce the assassination of the president, and even generate a video of a T-shirt-and-chain-wearing tech firm CEO laughing whereas greenback payments rain down round him. That’s a begin.
That mentioned, you may generate some troubling shit. With none intelligent workarounds I prompted Veo 3 to create a video of the House Needle on fireplace. Beginning with my very own photograph of Mount Rainier, I generated a video of it erupting with smoke and lava. Coupled with a clip of a information anchor asserting mentioned catastrophe, I can see how you could possibly seed some mischief actual simply with this instrument.
Right here’s the higher information: it doesn’t seem to be a ready-made deepfake machine. I gave it a few pictures of myself and requested it to generate a video with particular dialogue and it wouldn’t comply. I additionally requested it to carry a pair of large boots in a photograph to life and have them stroll out of the scene; it managed one boot stomping throughout the sidewalk with some comical crunching noises within the background.
I had a better time producing movies when my prompts have been much less particular, which is how I confirmed one thing my colleague Andrew Marino pointed out: Veo 3 is great at creating the type of lowest-common-denominator YouTube content material geared toward youngsters.
In case you’ve by no means been subjected to the countless pit of rubbish on YouTube Children, let me enlighten you. Think about watching the worst 3D rendering of a monster truck driving down a ramp, touchdown in a vat of coloured paint. Subsequent to it, one other monster truck drives down one other ramp into one other vat of paint — this time, a unique coloration. Now watch that once more. And once more. And once more. There are hours of these items on YouTube designed to mesmerize toddlers. These movies are normally innocent, simply empty energy designed to rack up views that make Cocomelon seem like Citizen Kane. In about 10 minutes with Veo 3, I threw collectively a clip following the identical primary method — full with jaunty background music. However the clip that’s much more troubling to me is the 2 cartoon cats on a pier.
I assumed it will be humorous to have the cats complain to one another that the fish aren’t biting. In simply a few minutes, I had a clip full with two cats and a few AI-generated dialogue that I by no means wrote. If it’s this simple to make a 10-second clip, stretching it out to a seven-minute YouTube video could be trivial. In its present type, clips revert to Veo 2 while you attempt to prolong them into longer scenes, which removes the audio. However the way in which that Google has been pushing these instruments ahead relentlessly, I can’t think about it’ll be lengthy earlier than you may edit a full feature-length video with Veo 3.
Actually, I ponder if this type of use for AI-generated video is a characteristic and never a bug. Google confirmed us some fancy AI-generated video from actual filmmakers, including Eliza McNitt, who’s working with Darren Aronofsky on a brand new movie with some AI-generated components. And positive, AI video may very well be an attention-grabbing instrument in the precise palms. However I feel what we’re almost definitely to see is a proliferation of the type of bland imagery that AI is so good at generating — this time, in stereo.
