How to detect filler sound like um, uh, etc using cmusphinx/mozilla deepspeech/google stt etc?

Question

I am working on a project in Speech Recognition and the task is to detect filler sounds like um, uh, eh, etc. on audio clips of children/students speaking in English. Their speaking English is not that great.

How can this be done using cmuSphinx/Mozilla deep speech/google cloud speech/Kaldi? Or do I need to start from scratch?

I also tried to go through other posts and papers on how to build an ASR but since its not a long term project, I do not have the time to spend on building it from scratch and see the results. Also, I am okay with less accuracy which I can claim to improve later on.

How to detect filler sound like um, uh, etc using cmusphinx/mozilla deepspeech/google stt etc?

Answers (0)

Related Questions