Twilio media stream used simultaneously with Speech to Text (Twilio Say)

Question

I'm working on a voicebot that uses twilio media stream (Google STT), processes the text and gives response back to the user using TwiML Say Object. I'm using an endpoint that is triggered once the user starts calling (status call is ringing):

@app.route("/twilio_call", methods=['POST'])
def voice(request):
    """Respond to incoming phone calls with a greet message"""
    call_status = request.form.get("CallStatus")

    if call_status == "ringing":
       voice_respond = VoiceResponse()
       voice_respond.say("hello! how can I help!", voice='women')

       return response.text(str(voice_response), content_type="text/xml")

After this message is passed to the user I want to trigger directly the websocket server with media stream.

@app.websockets('/media')
def transcribe_media(request, ws):
    while True:
        message = ws.recv()
        if message is None:
            continue

    data = json.loads(message)
    if data['event'] == "media":

                    ....
#here I'm sending data to google and getting the transcription back

I cannot modify call in progress like here in the docu: https://www.twilio.com/docs/voice/tutorials/how-to-modify-calls-in-progress-python

I tried already with:

client = Client(ACCOUNT_SID, AUTH_TOKEN)
        call = client.calls(conversation_id).update(
            twiml='' + msg_text + '')

However I'm getting an error the status call is not in-progress (it is "ringing)..

I also tried with the TwiML"STREAM" Object, but it was not starting the server when It is used together with TwiML "Say" object (It triggers the server when I pass only STREAM):

 voice_response = VoiceResponse()
 start = Start()
 start.stream(url='wss://.ngrok.io/webhooks/twilio_voice/media')
 voice_response.say("Hello, how can I help?", language="en-GB")
 voice_response.append(start)
response.text(str(voice_response), content_type="text/xml")

Does anybody know how can I approach this problem? How can i trigger the websocket server after the Twiml"Say" object is passed to the user?

philnash · Accepted Answer

Twilio developer evangelist here.

The correct way to achieve this is via the Stream TwiML element. I would recommend placing the stream at the start of the TwiML response so that it can establish in time for you to start receiving the user's speech. Also, once the TwiML is complete, Twilio will hang up the call, even if there is a live stream. So you should pause to wait for the user's voice response.

So, I would alter your webhook endpoint to this:

@app.route("/twilio_call", methods=['POST'])
def voice(request):
"""Respond to incoming phone calls with a greet message"""
call_status = request.form.get("CallStatus")

voice_respond = VoiceResponse()

start = Start()
start.stream(url='wss://.ngrok.io/webhooks/twilio_voice/media')
voice_respond.append(start)

voice_respond.say("hello! how can I help!", voice='women')
voice_respond.pause(length=60)

return response.text(str(voice_response), content_type="text/xml")

Now your stream should connect to your websocket endpoint, your user will hear the greeting. The call will not hang up because there is a 60 second pause, and when the user speaks you can use your websocket endpoint to send the speech to the STT service and when you get a response, redirect the call and use to speak the result.

Twilio media stream used simultaneously with Speech to Text (Twilio Say)

Answers (1)

Related Questions