What is Big Voice?

What is Big Voice?

Big Voice is a $20B a year business that makes money off of the analysis and collection of voice data. Voice printing is a form of biometric data analysis that detects a customers unique vocal signature.  The largest data set on the internet is voice. Voice to text is reliable, cheap and built into our smartphone or smart speakers. These devices are collecting voice data all the time. 

A voice print is a digital model of the unique vocal characteristics of an individual which, like other biometrics such as facial recognition and fingerprinting, uses machine learning (ML) to help businesses ascertain the identities of their customers. 

So how do they work exactly? Traditional voice biometric systems use what is called “feature extraction” one on or more speech samples. This feature extraction process, common when dealing with machine learning algorithms, creates personalized calculations about a person’s vocal characteristics which in turn creates a Universal Background Model or a ‘UBM’.

A UBM is essentially a grouping of different voice prints that serves as a repository for future voice prints to be measured against. Essentially, a new speech sample is compared to both the individual’s personal voice print and the UBM.  The score differences are calculated to arrive at a single score, which can then be interpreted as “passing” or “failing,” depending on the desired confidence for the usage scenario.

Later forms of voice printing involve deep neural networks (DNNs) which, similar to a UBM, rely on processing often hundreds of hours of representative speech samples. To verify a user, a speech sample is evaluated against the fine-tuned DNN model to arrive at a score, which can again be interpreted as “passing” or “failing” depending on the desired confidence for the usage scenario.

These forms of data collection have expanded rapidly over recent years, as voice assistants such as Apple’s Siri and Amazon’s Alexa have developed ways to better understand not only what we are saying but who is saying it, with startling accuracy. This is where it is good to differentiate speech recognition and voice printing. Voice printing differs to speech recognition only in the sense that while speech recognition understands the words being said, it is who is saying them which is detectable via voice printing.But it isn’t only voice assistants wherein this technology is applied.

Call centers today for instance are using AI to analyze people’s behavior during phone calls, developing profiles of people by examining the “tone, pace and pitch of every single word” in order to develop customer profiles and boost sales.

As we previously highlighted in the Conversational AI report, conversational intelligence is capable of translating vast amounts of unstructured data into powerful strategic insights in real-time.

Over the next few years, Big Voice, as it is known, wil be worth as much as $20bn a year, and as the market grows, so do demands for further regulation of this tech.

The reasons for protection are of course founded. Customers today use their voices for everything from passwords to payments, and many are harvesting this data with little regard for breaches in privacy. Last year, for instance, TikTok sneakily changed its privacy policies to start collecting voice prints.

But just because there are those who may use this technology more unethically than others, does not negate the fact that voice printing can be a valuable tool for gaining insights from contact centers, social media and other customer touchpoints.

Back to blog