Reputation: 11
I'm using IBM Watson to transcribe a video library that we have. I'm currently doing initial research into it's efficacy and accuracy.
The videos in question have OK to very good sound quality and based on Watson documentation I should be using the Broadband model to transcribe them.
I've however tested using both Narrow and Broadband and I'm finding that Narrowband always either slightly better or a lot better in some cases (up to 10%).
Has anyone else done any similar testing? It's contrary to the documentation so I'm a little reluctant to just go ahead and use Narrowband for everything, but I may have to based on the results.
I'm using ffmpeg to convert the videos to audio files to send to Watson, and the audio files show 48KHz sampling rates, which again means I should be using and getting better results using Broadband.
Hoping someone out there has done similar research and can help.
Thanks in advance.
Upvotes: 1
Views: 663
Reputation: 795
do you know what the original sampling rate of the audio is? Maybe it was recorded at 8k originally and then upsampled. If that were the case the original lower frequencies would be lost and the right model to use would be the Narrowband model. You can see this in an spectrogram, using for example audacity (https://github.com/audacity/audacity).
Another explanation would be that the n-grams in your video are better predicted by the language model that the Narrowband system uses. I suggest sharing your audio file with Watson support team to get further insight (you can go to the Bluemix portal and then click on "support").
Upvotes: 3