Machine Learning with phonics ASR

Question

There are many research on Automated Speech Recognition that convert speech to text. These tools are using deep learning to do that.

I have found that the way it works is based on the english language. If audio of word "Phonics" they will be either Foniks but the closest english word for that is Phonics.

Google APIs can provide us with ASR that gives us the end result. Is there any tool or open source that can give us the phonics sounds? Something like this "ˈfəʊnɪks" instead of "Phonics"

Thanks.

Dmytro Prylipko · Accepted Answer

There are several open source tools for ASR. Kaldi, CMU Sphinx and HTK are the most popular and well documented. Kaldi will be probably the best if you want to use DNNs for ASR.

However, the form of recognition result provided depends on your vocabulary. If you wish to have a word ˈfəʊnɪks instead of Phonics, you have to define it in the vocabulary. For instance:

!SIL sil
 spn
eight ey t
five f ay v
...
f_ey_ow_n_i_k_s f ey ow n i k s
....

Using Unicode symbols for word representation is not possible (as far as I remember), so I replaced them with X-SAMPA notation.

Follow this tutorial for in-depth explanation.

Machine Learning with phonics ASR

Answers (1)

Related Questions