Latest Microsoft AI VALL-E can clone voice from only three seconds of audio

This blog is related to Microsoft's latest AI release which can clone a person's voice in only 3 seconds named "VALL-E". AI is improving day by day and so we can't underestimate its amazing advancements which are really powerful and a lot helpful for us to complete SPECIFIC tasks within seconds.


Vall-E

  • According to Microsoft VALL-E is mainly designed for voice cloning and text-speech production which can clone voice with just 3 seconds of the audio snippet.
  • The model is based on a technology called EnCodec by Meta,
  • To train VALL-E, its creators used an audio library called LibriLight,
  • VALL-E's speech-synthesis capabilities have been trained from an audio library assembled by Meta, and containing 60,000 hours of English language speakers from more than 7,000 speakers.
Microsoft's sample shows how the application may produce changes in voice tone by adjusting the random seed used in the generating process. VALL-E may mimic the acoustic environment of the audio contained in the sample audio, for as how a voice might sound on the phone.

As it seems very interesting and cool but we can't neglect the fact that it could be used for bad purposes like blackmailing celebrities or politicians with cloned audio clips. So keeping in mind this fact the Microsoft team didn't make its code public so no one can misuse it. To differentiate between real and synthesized audio clips there should be another release that could tell which audio is real or which is synthesized. So let's just give it time and hope for the best in the future.

No comments

Powered by Blogger.