Alibaba Group Holding is working on a video-generating tool called Tora based on OpenAI’s Sora, marking the latest effort by the Chinese tech giant to develop video artificial intelligence (AI) tools.

Tora, a video-generation framework that adopts OpenSora as its foundational model, was described in a paper released last week by a group of five researchers from Alibaba. Alibaba owns the South China Morning Post.

The Tora framework achieved a breakthrough based on the Diffusion Transformer (DiT) architecture, the novel architecture that underpins Sora, the text-to-video model launched by OpenAI in February, according to the paper, which was published on repository website arXiv.

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

The researchers claim to have developed the first “trajectory-oriented DiT framework for video generation”, meaning it ensures the generated movements precisely follow the specified trajectories while replicating the dynamics of the physical world.

“We adapted OpenSora’s workflow to transform raw videos into high-quality video-text pairs and leverage an optical flow estimator for trajectory extraction,” they said.

The Alibaba booth at the World Artificial Intelligence Conference (WAIC) in Shanghai, July 6, 2023. Photo: Reuters alt=The Alibaba booth at the World Artificial Intelligence Conference (WAIC) in Shanghai, July 6, 2023. Photo: Reuters>

The paper references a series of videos that show different objects – from a wooden sailing boat in a river to men cycling on the highway – moving in accordance with designated trajectories. Tora is capable of generating videos guided by trajectories, images, text, or a combination of the three, according to the researchers.

The researchers, who labelled the project as “ongoing”, did not state when the new tool would be available for public use.

The move by Alibaba marked the latest effort by the Hangzhou-based tech giant to launch Sora-like video-generating tools, as Chinese companies rush to gain a foothold in the AI video field.

In July, Chinese start-up Shengshu AI rolled out its text-to-video tool Vidu, which allows registered users to generate clips of four or eight seconds in length, becoming the latest player in the country to offer such services to the public following Zhipu AI and Kuaishou Technology.

That came a few days after Zhipu AI, one of China’s four new “AI Tigers”, debuted its Ying video generation model, which accepts both text and image prompts to generate six second video clips in around 30 seconds.

Alibaba’s move, however, is not the first step it has taken in the field of AI video generation. In February, the company unveiled an AI video-generation model called Emote Portrait Alive, or EMO.

The model, dubbed an “expressive audio-driven portrait-video generation framework”, can turn a single still reference image and audio vocal sample into an animated avatar video with facial expressions and poses.

The research paper did not mention whether Tora will be linked with EMO or Tongyi Qianwen, Alibaba’s self-developed family of large language models.

This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP’s Facebook and Twitter pages. Copyright © 2024 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2024. South China Morning Post Publishers Ltd. All rights reserved.





Source link