Published On: Tue, Oct 10th, 2017

DeepMind’s WaveNet Technology Makes Google Assistant’s New Male and Female Voices Sound More Realistic

Google recently rolled out Male and Female voice options for Google Assistant in English. A estimable choice for those who have voice preferences for practical assistants. The new voices for a partner sound some-more real, interjection to a low neural network for sound singularity by Alphabet’s DeepMind division.

In 2016, Alphabet lab introduced a WaveNet low neural network for “generating tender audio waveforms that is able of producing improved and some-more realistic-sounding debate than existent techniques.”

nexus2cee_google-duo-review-728x460Related Google Duo App Gets Deeper Integrations With Android Apps Like Dialer, Contacts Android Messages

In a camber of 12 months, a group tested this “computationally intensive” investigate antecedent on consumer products, initial one being Google Assistant voices for US English and Japanese. The new indication can furnish waveforms 1000 times faster with improved fortitude and fealty than a original.

Computational Approach

Alphabet’s computational proceed to text-to-speech is a vast jump brazen in comparison to prior methods that concerned voice artists in recording a outrageous database of sounds that were gathered together. On a downside, a computational process could outcome in fake sounds that are formidable to cgange as a whole database needs tweaking whenever new changes are introduced such as intonations or emotions. But it takes proceed obtuse time in estimate sounds than a prior method.

Google Assistant Waveform

DeepMind’s computational proceed introduced in 2016 enclosed a “deep generative indication that can emanate particular waveforms from scratch.”

pixel-2-8Related Google Explains Why a Pixel 2 Does Not Have a Headphone Jack

It enabled inclusion of healthy sounds that sync improved and benefaction healthy accents, intonation, and even skeuomorphic sounds like “lip smacks.”

In a blog post, DeepMind explains:

It was built regulating a convolutional neural network, that was lerned on a vast dataset of debate samples. During this training phase, a network dynamic a underlying structure of a speech, such as that tones followed any other and what waveforms were picturesque (and that were not). The lerned network afterwards synthesised a voice one representation during a time, with any generated representation holding into comment a properties of a prior sample.

The ensuing voice contained healthy intonation and other facilities such as mouth smacks. Its “accent” depended on a voices it had lerned on, opening adult a probability of formulating any series of singular voices from blended datasets. As with all text-to-speech systems, WaveNet used a content submit to tell it that difference it should beget in response to a query.

You can check out DeepMind’s latest blog post on a new proceed for masculine and womanlike voices on Google Assistant.

About the Author

Leave a comment

XHTML: You can use these html tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>