Reputation: 147
I am working on a project in Speech Recognition and the task is to detect filler sounds like um, uh, eh, etc. on audio clips of children/students speaking in English. Their speaking English is not that great.
How can this be done using cmuSphinx/Mozilla deep speech/google cloud speech/Kaldi? Or do I need to start from scratch?
I also tried to go through other posts and papers on how to build an ASR but since its not a long term project, I do not have the time to spend on building it from scratch and see the results. Also, I am okay with less accuracy which I can claim to improve later on.
Upvotes: 2
Views: 665