When you purchase through links on our site, we may earn an affiliate commission. This doesn’t affect our editorial independence.
The artificial intelligence aspect of tech has experienced a new development following the creation of an AI voice generator. Two students who reportedly lack extensive AI expertise were confirmed to have created an openly available AI model. The AI can generate podcast-style clips, similar to Google’s NotebookLM.
TechPolyp notes that the market for synthetic speech tools is growing rapidly due to input from tech-savvy individuals. ElevenLabs is one of the most significant market players. However, there are top contenders, such as PlayAI and Sesame, among others. Investors strongly believe in the potential of these tools. This is evident in PitchBook’s records, which show that startups developing AI voice generator tools raised over $398 million in venture capital funding last year.
Toby Kim, one of the founders, started learning about speech AI three months ago with his colleague. Toby is based in Korea and is one of the co-founders of Nari Labs, the group behind the newly released model. He noted that NotebookLM inspired innovative ideas to create a model that offered more control over generated voices. This also extends to “freedom in the script”.
Kim confirms they used Google’s TPU Research Cloud program while building their AI voice generator. This programme provides researchers with free access to the company’s TPU AI chips. It also trains Nari’s model, Dia. Dia has the potential to generate from a script, weighing in at 1.6 billion parameters. It also lets users customize speakers’ tones and insert disfluencies. Additionally, users can also add coughs, laughs, and other nonverbal cues.
What’s the Performance Level of Dia—AI Voice Generator?
Generally, models with more parameters perform better. Parameters are the internal variables that models use to make predictions. Dia AI Voice Generator is parameter-infused. Dia is available from the AI dev platform Hugging Face and GitHub. It can also run on most modern PCs with at least 10GB of VRAM. The AI has the potential to generate a random voice unless prompted with a description of an intended style. However, that doesn’t stop the tool from cloning a person’s voice.
You can test Dia through Nari’s web demo, generating two-way chats about any subject. It is noteworthy that the quality of the voices is competitive with other tools available. Similarly, the voice cloning function is among the easiest to use on the tool.
Dia performs like a top AI voice generator. It does offer little in the way of safeguards. However, it’d be trivially easy to craft disinformation or a scammy recording. However, in an effort to prevent misuse of the tool, Dia, on its project pages, discourages the abuse of the model to impersonate others. Nari also warns against deception or otherwise engaging in illicit campaigns. Above all, it disclaims taking responsibility for the misuse of any case.
How was Dia Trained?
As of the time of filing this report, Nari hasn’t disclosed which data it scraped to train Dia. However, an individual commented on Hacker News that a sample sounds like the hosts of NPR’s “Planet Money” podcast. This suggests that Toby and his team may have utilized copyrighted tools to train the AI voice generator. Training models on copyrighted content is now commonplace, but it remains a legally dubious practice. Notwithstanding, some AI companies claim that fair use shields them from liability. However, in the same vein, rights holders assert that fair use doesn’t apply to training. Meta and other companies are in court, facing litigation of this nature.
Kim made it known that Nari plans to create a synthetic voice platform with a “social aspect”. Their company will utilize Dia AI’s voice generator and future larger models. Nari notes that they will release a technical report for Dia. He hopes this will expand the model’s support to languages beyond English.