When you purchase through links on our site, we may earn an affiliate commission. This doesn’t affect our editorial independence.
The attention on generative AI has been chiefly on text-based interfaces that generate text, images, and more. However, the new development tilts towards the usage of voice notes. Recently, Google announced that its Chirp 3 AI voice model, which is its speech-to-text and HD text-to-speech models, will be added to its Vertex AI development platform starting next week.
Last week, Google quietly announced that the Chirp 3 AI voice model would be rolling out eight new voices for 31 languages. Use cases for the platform involve building voice assistants, creating audiobooks, and developing support agents and voice-overs for videos. The news was announced at an event at Google’s DeepMind offices in London.
Its efforts are coming simultaneously; others are also leaping forward with their voice AI work. Last week, Sesame, the startup behind the viral, very realistic-sounding “Maya” and “Miles” AI apps, announced the launch of its model for developers to build their customized apps and services on top of its tech.
Notably, there will be usage restrictions around Chirp 3 to try to keep a handle on misuse. “We’re just working through some of these things with our safety team,” said Thomas Kurian, CEO of Google Cloud, at a news event today.
ElevenLabs is among the major startups that have raised hundreds of millions in funding to expand their work in AI voice services.
The news will bring Chirp 3 into the same stable as newer versions of its flagship LLM, Gemini, which are being tested, its image-generation model Imagen, and its pricey Veo 2 video generation tool.
It remains to be confirmed whether what Google is releasing with Chirp 3 will be as “realistic” as some of the other AI efforts to create “human” voices (Sesame’s work stands out in particular). But as Demis Hassabis, the CEO of DeepMind, emphasized, this remains a marathon, not a sprint.
“In the near term… this idea that [AI is] a silver bullet to everything in the next couple of years, I don’t see that happening just yet…think we’re still quite a few years away from something like AGI happening,” he said. “It will change things… over the next decade, so the medium to longer term. It’s one of those interesting moments in time.”
Google launched Vertex AI in 2021 as a platform for developers to build machine learning services in the cloud. That was, of course, well before the explosion of interest in AI, specifically generative AI, that came with the launch of OpenAI’s GPT services.
Since then, the company has been leaning into Vertex AI, partly as it plays catch up to other companies like Microsoft and Amazon, they are also building generative AI tooling for developers. In addition to building generative AI on top of Gemini, developers can use Vertex AI to classify data, train models, and set up production models. It will be interesting to see whether it moves to expand its walled garden to models beyond those created by Google itself.
Google has been building “Chirp” voice services for years, returning to using the name as a code name for its early efforts to compete against Amazon’s Alexa service.