Reputation: 985
I am using Twilio to receive phone calls, and want to be able to:
So essentially I would like to start a stream and have it running for the entirety of the call, and have a bunch of s to transcribe every "sentence" in the incoming audio.
If I send Twilio followed by , then it starts the stream and it pretty much hangs until the stream ends and never ends up running Gather.
If I send followed by , then it never starts the stream because it hangs at Gather until it finishes and then sends the transcribed data to whatever action url I set up, at which point it is far too late to start the stream (as in even if I could, which I doubt, I would have lost a few seconds of the call).
The best I could do is return just and start the stream via the Twilio python library. Unfortunately the library can only start unidirectional streams, so I can't send any audio data back.
If I try to update the call through the API, the updates seem to completely overwrite the current actions. Updating the call with while it is running a stream seems to kill the stream. The opposite way (update the call with a while it runs a ) seems to kill the ).
Using the python library would be perfect, if only it could start bidirectional streams. Is there any way to achieve what I'm trying to achieve?
Upvotes: 0
Views: 153
Reputation: 2562
If I understood corretcly, you want to implement a bi-directional audio streaming and transcription during a call using Twilio
I suggest to go with the following steps:
Here an example implementation in Python (not tested) about the final part (sending the processed audio back to Twilio), I can try to add more details if you need it
import asyncio
import websockets
import pyaudio
async def send_audio_data(websocket):
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
try:
while True:
data = stream.read(1024)
await websocket.send(data)
except asyncio.CancelledError:
stream.stop_stream()
stream.close()
pa.terminate()
async def main():
async with websockets.connect("wss://yourserver.com/stream") as websocket:
# Send audio data
await send_audio_data(websocket)
# Receive and print transcriptions
async for message in websocket:
print(f"Received transcription: {message}")
asyncio.get_event_loop().run_until_complete(main())
Upvotes: 0
Reputation: 106
Some of the question above is missing inform, so please correct me if I have misunderstood. It sounds like you want to be able to have a bidirectional stream which sends a call to a 3rd party app/ service.
So this should be possible with Twilio Media Streams and TwiML [1]. You can create a custom app that uses websockets to receive the media from Twilio and then return data in the media stream
Twilio have recently release a blog post [2] showing how Media Streams can be used with OpenAI's realtime API. Their is example python code for this post which can be found at https://github.com/twilio-samples/speech-assistant-openai-realtime-api-python
Hope this helps
[1] https://www.twilio.com/docs/voice/twiml/stream
[2] https://www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-python
Upvotes: 0