Watson speech-to-text: Narrowband producing better results than Broadband?

Question

I'm using IBM Watson to transcribe a video library that we have. I'm currently doing initial research into it's efficacy and accuracy.

The videos in question have OK to very good sound quality and based on Watson documentation I should be using the Broadband model to transcribe them.

I've however tested using both Narrow and Broadband and I'm finding that Narrowband always either slightly better or a lot better in some cases (up to 10%).

Has anyone else done any similar testing? It's contrary to the documentation so I'm a little reluctant to just go ahead and use Narrowband for everything, but I may have to based on the results.

I'm using ffmpeg to convert the videos to audio files to send to Watson, and the audio files show 48KHz sampling rates, which again means I should be using and getting better results using Broadband.

Hoping someone out there has done similar research and can help.

Thanks in advance.

Watson speech-to-text: Narrowband producing better results than Broadband?

Answers (1)

Related Questions