Use a bidirectional stream together with Gather on Twilio

I am using Twilio to receive phone calls, and want to be able to:

use to receive live audio, and send back audio data to be played (i.e. a bidirectional stream)
use to transcribe the incoming audio data, receive the data, and then use once again to transcribe more data

So essentially I would like to start a stream and have it running for the entirety of the call, and have a bunch of s to transcribe every "sentence" in the incoming audio.

If I send Twilio followed by , then it starts the stream and it pretty much hangs until the stream ends and never ends up running Gather.

If I send followed by , then it never starts the stream because it hangs at Gather until it finishes and then sends the transcribed data to whatever action url I set up, at which point it is far too late to start the stream (as in even if I could, which I doubt, I would have lost a few seconds of the call).

The best I could do is return just and start the stream via the Twilio python library. Unfortunately the library can only start unidirectional streams, so I can't send any audio data back.

If I try to update the call through the API, the updates seem to completely overwrite the current actions. Updating the call with while it is running a stream seems to kill the stream. The opposite way (update the call with a while it runs a ) seems to kill the ).

Using the python library would be perfect, if only it could start bidirectional streams. Is there any way to achieve what I'm trying to achieve?

Upvotes: 0

Answers (2)

Dos

Reputation: 2562

If I understood corretcly, you want to implement a bi-directional audio streaming and transcription during a call using Twilio

I suggest to go with the following steps:

Receive/Send audio streaming with Twilio: use MediaStreams API to stream audio to and from your server in real-time (API allows for bidirectional audio streaming)
Audio transcription: you can use the Google Cloud Speech-to-Text API or similar cloud service (Microsoft Azure Speech, Amazon Transcribe or IBM Watson Speech to Text)
Receive and modify the I/O streams on your BE and send it back throgh Twilio's API

Here an example implementation in Python (not tested) about the final part (sending the processed audio back to Twilio), I can try to add more details if you need it

import asyncio
import websockets
import pyaudio


async def send_audio_data(websocket):
    pa = pyaudio.PyAudio()
    stream = pa.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)

    try:
        while True:
            data = stream.read(1024)
            await websocket.send(data)
    except asyncio.CancelledError:
        stream.stop_stream()
        stream.close()
        pa.terminate()


async def main():
    async with websockets.connect("wss://yourserver.com/stream") as websocket:
        # Send audio data
        await send_audio_data(websocket)

        # Receive and print transcriptions
        async for message in websocket:
            print(f"Received transcription: {message}")


asyncio.get_event_loop().run_until_complete(main())

Upvotes: 0

RJPearson94

Reputation: 106

Some of the question above is missing inform, so please correct me if I have misunderstood. It sounds like you want to be able to have a bidirectional stream which sends a call to a 3rd party app/ service.

So this should be possible with Twilio Media Streams and TwiML [1]. You can create a custom app that uses websockets to receive the media from Twilio and then return data in the media stream

Twilio have recently release a blog post [2] showing how Media Streams can be used with OpenAI's realtime API. Their is example python code for this post which can be found at https://github.com/twilio-samples/speech-assistant-openai-realtime-api-python

Hope this helps

[1] https://www.twilio.com/docs/voice/twiml/stream

[2] https://www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-python

Upvotes: 0

Use a bidirectional stream together with Gather on Twilio

Answers (2)

Related Questions