Best approach to compare recognized speech with a known text

Question

Given a known manuscript (text) which I expect the user to read (more or less accurately), what is the best approach to recognize the user's progress within the manuscript?

While I'm searching for a particular solution on iOS, I'm also interested in a more general answer.

iOS provides a speech recognition framework called Speech that I can use to recognize any speech. My current approach is to use the string results of this framework to match them against the manuscript. However, it seems to me like this has quite some overhead and that it would save resources and increase precision when I first feed the speech recognizer with the expected words so that it "knows" what to listen for.

For example, when the next word in the manuscript is "fish", I don't need the speech recognizer to search the whole English language dictionary for a word that best matches the recorded audio – I only need to get a probability value how likely it is that the user just said "fish".

I think it's very similar to keyword spotting only that I'm not only spotting a few keywords but the words in a whole manuscript.

Unfortunately, I haven't been able to find such an API on iOS. Is there any better approach to achieve this "speech tracking" than the one described above?

Best approach to compare recognized speech with a known text

Answers (1)

Related Questions