Reputation: 71
Does anyone knows if it is possible by Twilio to create multiple audio records during a call based on a kind of audio flag or pattern, like silence for example. So that you could fire a callback on the end of each portion of speech to generate text during the call.
thank...
Upvotes: 2
Views: 877
Reputation: 1
To get live transcription with Twilio, you would need to use a 3rd party Speech To Text with Twilio Media Streams that also supports a streaming/infinite speech to text recognition, like Google Cloud Speech To Text. Unfortunately I don't think there is a native Twilio verb or action that does live speech to text/live transcription. Maybe you could run something on iOS, but I think having a backend server handle this is probably better and more scaleable in the future.
At a high level you need to do the following:
Twilio themselves has created several different guides on how to do this:
I spent sometime familiarizing myself with these guides and made a similar live transcription guide in Java using Dropwizard framework as well (written by myself)
These approaches will work for proof-of-concepts but do not cover areas related to security or scaling of the audio stream processing.
Upvotes: 0
Reputation: 3434
Twilio Evangelist here.
So, you could use the timeout
attribute on the <Record>
verb to get short 'bursts' of spoken text, but this may mean you time out while the caller is speaking a word. So you would only get half of it! This may make it difficult to decipher what is being said, and I would personally not use this approach.
You can end recording on a key-press (a DTMF tone) with the finishOnKey
attribute, which may help your needs.
You cannot currently get a live, or near realtime transcription. You will receive the transcription very quickly, but we only support the timeout and key presses to end a recording and begin transcription.
Hope this helps!
Upvotes: 4