A Common Voice
The intersection of technology and language is something we think about a lot. We are fundamentally in the business of facilitating communication, and technology seems to be continuously making this easier!
The Common Voice project is a new initiative by Mozilla to create an open-source speech database. Mozilla calls this comprehensive database ‘deep speech.’ At the moment, visitors to the site can ‘donate’ their voice by recording sentences that appear on-screen. Users can also help by verifying submissions. A given recording must be verified by two people before it is added to the database.
Historically, recording a new voice for a device has been a time consuming process. Hundreds of phrases and sounds would have to be recorded in order to create a database large enough to answer open-ended questions. It is expensive to hire a professional and rent a sound studio. By keeping this dataset open source, Mozilla hopes to cut out this large expense for new companies.
Additionally, the virtual assistants we are most familiar with – Siri, Cortana, and Alexa – have female names and voices. There are many theories about why this is the case. Do male programmers assume assistants should be female? Is the female voice clearer and easier to understand? Do we find the idea of a female presence in our house less threatening? In the future, it may be important to have more variety. That is just one of the aims of this project.
Real World Voice Recognition
One of the goals of the Common Voice project is to collect as much data in real-world conditions as possible. When we currently use voice assistants, it it not in a studio setting! The washing machine might be running or there might be other voices in the room. Training speech recognition software on real-world data will make it better in the long run. Users are encouraged to upload audio even if they can only record on their phone. The Common Voice project also wants to hear from non- native speakers. This could be an excellent way to practice a second language!
In the future, we will increasingly rely on voice recognition to function. We can already talk to our phones, our cars, our lightbulbs, and basically every other device in our house. We’ve written before about how frustrating it can be when these smart devices don’t understand us. If a richer dataset can help devices recognize a wider variety of accents, we’re all for it.
Because voice donors are asked questions about their ethnicity, age, and accent, the dataset might also be used to detect who is speaking. Applications for this range from distinguishing between family members to guessing the age and ethnicity of a person based on their voice.
Advances to the kind of robo-voice we associate with prosthetic voices (like Stephen Hawking’s) might also benefit from this dataset. Vocal engineers have already made progress in this area, which gives those who rely on synthetic speech the ability to sound more like themselves again.
Check out all of the languages currently in the database here. If nothing else, verifying sentences is a great way to kill five minutes!