Speech Recognition based on a collection of audio

I have a large collection of audio files with their transcripts in a foreign language.
I want to be able to recognize whether the user recites the right words from the text.
How do I start approaching this using CMU Sphinx? Do I need a language model, acoustic model?
I would like some guidance please and where to start from.

Upvotes: 0

Answers (1)

Nikolay Shmyrev

Reputation: 25220

How do I start approaching this using CMU Sphinx?

You recognize the audio and compare it to the transcription. In case of mismatches you can warn your user

Do I need a language model, acoustic model?

Yes, you need both. You can build them from your collection, but you still need a bootstrapped data. To get more advise here it is worth to mention the language.

I would like some guidance please and where to start from.

Start with a tutorial https://cmusphinx.github.io/wiki/tutorial

Upvotes: 0

Speech Recognition based on a collection of audio

Answers (1)

Related Questions