jotadepicas
jotadepicas

Reputation: 2493

Pocketsphinx - is audio pre-processing necessary / recommended?

I am using pocketsphinx for speech recognition with a Spanish acoustic model and a JSGF grammar, with decent results so far.

However, I'm getting erroneous recognition results with audios that, at least to my ear, seem perfectly intelligible (not so much background noise, sampling frequency and bit depth according to acoustic model parameters, etc).

Also this audios that are not correctly recognized, do not seem to differ a great deal from the ones that are correctly recognized (in fact they sound pretty much the same to me).

So, I'm guessing there is something in the audio that makes it more difficult to recognize, perhaps some noise frequencies or other stuff that need to be filtered? (background noise, "pop" sounds of speech, frequencies outside the band of the human voice, etc)

In short, do you know if pocketsphinx already does something of this, and if not, do you know any best-practice filter/transformation/etc to be applied to an audio file in order to improve speech recognition results?

Thanks!

Upvotes: 0

Views: 565

Answers (1)

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25220

No, any preprocessing is usually quite harmful for speech recognition accuracy.

The modern speech recognition algorithms are made the way that even slight preprocessing might get results much worse. It will not be easily distinguishable by your ear since your speech recognition capabilities are far more superior than computer ones. Things like slight echo added to improve naturalness or simple mp3 compression/decompression might reduce accuracy significantly.

The solution for this is to train a model from the same audio you want to recognize, for example, train on mp3 decompressed audio instead of clean one. Default model is trained on a clean audio and that makes it not very robust to sound modifications. Such multi-style training has its own disadvantages because it makes training data very big, so it still a subject of ongoing research.

Upvotes: 1

Related Questions