Reputation: 2977
I have a large collection of audio files with their transcripts in a foreign language.
I want to be able to recognize whether the user recites the right words from the text.
How do I start approaching this using CMU Sphinx? Do I need a language model, acoustic model?
I would like some guidance please and where to start from.
Upvotes: 0
Views: 264
Reputation: 25220
How do I start approaching this using CMU Sphinx?
You recognize the audio and compare it to the transcription. In case of mismatches you can warn your user
Do I need a language model, acoustic model?
Yes, you need both. You can build them from your collection, but you still need a bootstrapped data. To get more advise here it is worth to mention the language.
I would like some guidance please and where to start from.
Start with a tutorial https://cmusphinx.github.io/wiki/tutorial
Upvotes: 0