Reputation: 21
I have previously been using the IBM Watson speech to text service to transcribe full audio files that have been pre-recorded. However, I am now trying to do live transcription while using the speaker identification feature. This means that I cannot send each short file (recording audio in about 30 second chunks) individually since the context of the speakers has to be maintained. How can I do this while still utilizing Python?
Upvotes: 1
Views: 1045
Reputation: 5330
You need to use WebSocket for real-time transcription. You pass in a chunk of audio and Watson responds with a transcription. You just need to detect the silence to break the stream up into chunks.
You also need to specify the language to be used for the transcription, and it's better when the source audio is coming from a phone call, you should use the Narrow Band models for the best results.
IBM® recommends that you use the broadband model for responsive, real-time applications (for example, for live-speech applications). Reference.
You can check one full example using Python with Watson STT in Python in this link. This example uses Nexmo, but you can get the logic for using in any application for real-time transcripts.
Upvotes: 0