Midjourney has rolled out the first version of its AI-powered video generation model, allowing users to create short animated clips from images on the platform. The tool is available via the web and through Midjourney’s Discord server, though it currently requires a paid subscription.
This initial version enables users to generate five-second clips from images they either create or upload to the platform. After generating an image, users will now see an “animate” button, which leads them through a prompt-based animation process. By default, the system adds motion using a generic prompt, but a manual option allows for custom descriptions of movement. Users can also input a starting image to guide the animation.
Midjourney lets users extend the animation in four-second increments, up to four times, resulting in a video lasting up to 21 seconds. The platform offers both high and low motion modes, letting users control whether the subject, camera, or both are animated.
The pricing structure is tied to GPU time, with subscriptions starting at $10 per month for 3.3 hours of fast GPU usage—roughly equivalent to 200 image generations. For video, Midjourney estimates it will cost about eight times more than generating a single image, translating to one image’s worth of cost per second of video.
“This is just a stepping stone,” wrote Midjourney founder David Holz in a post announcing the feature. He added that the company is aiming for more advanced models in the future that could enable real-time open-world simulations.
The release comes amid legal tensions. Midjourney is currently facing a lawsuit from Disney and Universal, which have expressed concerns about the company’s video ambitions. The lawsuit describes Midjourney as a “virtual vending machine” for unauthorised reproductions of copyrighted works, specifically pointing to its video generator as a threat. The studios allege that the model’s training likely infringes on their intellectual property.
Midjourney joins a growing list of tech companies, including OpenAI, Google, and Meta, in the AI video generation space. Each has introduced tools that convert text prompts into video, as the race to build next-gen content creation tools accelerates.