Google DeepMind: New AI Model V2A Unveiling

Google DeepMind recently unveiled V2A (Video-to-Audio) in a recent blog post, a cutting-edge AI model that combines video visual signals with text prompts to generate immersive sound and audio experiences. This innovative technology seeks to revolutionize the production and consumption of AI-generated videos, incorporating elements like captivating music, lifelike sound effects, and synchronized dialogue to […]

Google DeepMind: New AI Model V2A Unveiling, Generates Soundtrack and Dialogues for Videos
by Aparajita Sambhaw - June 19, 2024, 3:21 pm

Google DeepMind recently unveiled V2A (Video-to-Audio) in a recent blog post, a cutting-edge AI model that combines video visual signals with text prompts to generate immersive sound and audio experiences.

This innovative technology seeks to revolutionize the production and consumption of AI-generated videos, incorporating elements like captivating music, lifelike sound effects, and synchronized dialogue to enhance the overall viewing experience.

V2A is designed for seamless integration with Veo, Google’s text-to-video model revealed at Google I/O 2024. This integration enables users to elevate their videos not only visually but also in terms of audio.

It can incorporate audio into various content, ranging from contemporary videos made with Veo to silent movies and vintage archival clips, revitalizing them in a fresh and immersive manner, and also stands out for its impressive capability to produce an endless variety of soundtracks suitable for any video.

Users have the option to fine-tune the audio output using ‘positive prompts’ and ‘negative prompts’ to achieve the desired sound quality. Moreover, each generated audio piece is uniquely watermarked with SynthID technology, guaranteeing its originality and genuineness.

This AI model employs a diffusion model that underwent training using a combination of sounds, dialogue transcripts, and videos. Although the model exhibits considerable capability, it underwent training on a limited number of videos, resulting in occasional discrepancies in audio output. Due to this limitation and as a precaution against possible misuse, Google has no immediate plans to make V2A available to the general public.

Google DeepMind’s release of V2A marks a notable advancement in video creation technology. This innovation addresses a key need by incorporating sound and dialogue, enhancing the immersive and captivating aspects of videos.

While V2A is currently under development and not accessible to the public, its potential for revolutionizing video production is highly promising.