Sun Tzun
Sun Tzun

Reputation: 113

Improving speach-to-text recognition

I just started studing machine learning and related technologies. I chose speech recognition as a starting point. I tried Google Cloud Speech-to-Text and recognize the google sample and my own sample. As it turns, it didn't correctly recognize all the words in my sample.

Upvotes: 1

Views: 268

Answers (1)

Google Cloud Speech-to-Text (SST) is powered by pre-trained Machine Learning models, however, it's an ever improving service.

In order to ensure you are making the most out of SST please review the Best Practices as published in the public documentation, these include amongst other:

  • Sampling rate
  • Transmission codec
  • Background noise
  • Input channel usage
  • Frame size

Without your sample file it is hard to pinpoint where you need to work in order to improve the quality of the results, however, please note that Google tutorials are designed already considering the above mentioned best practices.
As a quick example, please note that in this How-to guide to Performing synchronous speech recognition on a local file two best practices can be found:

  • Encoding was done using LINEAR16 codec
  • Sampling rate is at 16000 hertz

Please review this document on how to optimize audio files to learn more.

Moving on, there are ways to adapt models to your specific needs, please review this document on how to improve transcription results, and based on your question this section on how to improve recognition of words and phrases; additionally you might want to dive into classes as these are really helpful when you are implementing for an specific business case.

There are plenty of options on Speech-to-Text and other ML/AI technologies, and it is hard to rank one over another, but please review this blog post on which this topic is explored.

Upvotes: 2

Related Questions