AI Voice Generator Dia Brings Custom Podcast-Style Speech to Everyone

Date:

The artificial intelligence aspect of tech has experienced a new development following the creation of an AI voice generator. Two students who reportedly lack extensive AI expertise were confirmed to have created an openly available AI model. The AI can generate podcast-style clips, similar to Google’s NotebookLM.

TechPolyp notes that the market for synthetic speech tools is growing rapidly due to input from tech-savvy individuals. ElevenLabs is one of the most significant market players. However, there are top contenders, such as PlayAI and Sesame, among others. Investors strongly believe in the potential of these tools. This is evident in PitchBook’s records, which show that startups developing AI voice generator tools raised over $398 million in venture capital funding last year.

Toby Kim, one of the founders, started learning about speech AI three months ago with his colleague. Toby is based in Korea and is one of the co-founders of Nari Labs, the group behind the newly released model. He noted that NotebookLM inspired innovative ideas to create a model that offered more control over generated voices. This also extends to “freedom in the script”.

Kim confirms they used Google’s TPU Research Cloud program while building their AI voice generator. This programme provides researchers with free access to the company’s TPU AI chips. It also trains Nari’s model, Dia. Dia has the potential to generate from a script, weighing in at 1.6 billion parameters. It also lets users customize speakers’ tones and insert disfluencies. Additionally, users can also add coughs, laughs, and other nonverbal cues.

What’s the Performance Level of Dia—AI Voice Generator?

Generally, models with more parameters perform better. Parameters are the internal variables that models use to make predictions. Dia AI Voice Generator is parameter-infused. Dia is available from the AI dev platform Hugging Face and GitHub. It can also run on most modern PCs with at least 10GB of VRAM. The AI has the potential to generate a random voice unless prompted with a description of an intended style. However, that doesn’t stop the tool from cloning a person’s voice.

You can test Dia through Nari’s web demo, generating two-way chats about any subject. It is noteworthy that the quality of the voices is competitive with other tools available. Similarly, the voice cloning function is among the easiest to use on the tool.

Dia performs like a top AI voice generator. It does offer little in the way of safeguards. However, it’d be trivially easy to craft disinformation or a scammy recording. However, in an effort to prevent misuse of the tool, Dia, on its project pages, discourages the abuse of the model to impersonate others. Nari also warns against deception or otherwise engaging in illicit campaigns. Above all, it disclaims taking responsibility for the misuse of any case.

How was Dia Trained?

As of the time of filing this report, Nari hasn’t disclosed which data it scraped to train Dia. However, an individual commented on Hacker News that a sample sounds like the hosts of NPR’s “Planet Money” podcast. This suggests that Toby and his team may have utilized copyrighted tools to train the AI voice generator. Training models on copyrighted content is now commonplace, but it remains a legally dubious practice. Notwithstanding, some AI companies claim that fair use shields them from liability. However, in the same vein, rights holders assert that fair use doesn’t apply to training. Meta and other companies are in court, facing litigation of this nature.

Kim made it known that Nari plans to create a synthetic voice platform with a “social aspect”. Their company will utilize Dia AI’s voice generator and future larger models. Nari notes that they will release a technical report for Dia. He hopes this will expand the model’s support to languages beyond English.

Adewuyi Omotola
Adewuyi Omotola
Adewuyi Omotola is a reporter and writer for TechPolyp. His writings are insightful and stand out.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

spot_imgspot_img

Popular

More like this
Related

The “Second-Time Founder” Effect Impacting Africa’s Startup Ecosystem

Over the last decade, Africa has witnessed a significant...

Midddleman Is Building the Infrastructure Africa-China Trade Never Had

Every year, hundreds of thousands of African traders make...

Why Africa’s Startup Ecosystems Are Growing Faster Outside Traditional Tech Hubs

For years, conversations about African innovation have been heavily...

Why Startups Founded By Diaspora Returnees Often Succeed in Africa

Among the many startup success stories, one group that...