Reputation: 1150
I followed this example and managed to collect the audio buffers from my microphone send them to Dialogflow.
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-stream
But this processing is sequential. I first have to collect all the audio buffers that I afterwards can send to Dialogflow.
Then I get the correct result and also the intermediate results.
But only after I waited for the person to stop talking first before i could send the collected audio buffers to Dialogflow.
I would like to send (stream) the audiobuffers instantly to dialogflow, while somebody is still talking, and also get the intermediate results right away.
Does anybody know if this is possible and point me in the right direction? My preferred language is Python.
Thanks a lot!
Upvotes: 2
Views: 1189
Reputation: 581
While trying to do something similar recently, I found that someone already had this problem and figured it out. Basically, you can feed an audio stream to DialogFlow via the streamingDetectIntent
method and get intermediate results as valid language is recognized in the audio input. The tricky bit is that you need to set a threshold on your input stream so that the stream is ended once the user stops talking for a set duration. The closing of the stream serves the same purpose as reaching the end of an audio file, and triggers the intent matching attempt.
The solution linked above uses SoX to stream audio from an external device. The nice thing about this approach is that SoX already has options for setting audio level thresholds to start/stop the streaming process (look at the silence
option), so you can fine-tune the settings to work for your needs. If you're not using NodeJS, you may need to write your own utility to handle initiating the audio stream, but hopefully this can point you in the right direction.
Upvotes: 0
Reputation: 1150
I got this Answer from the Dialogflow support team:
From the Dialogflow documentation: Recognition ceases when it detects the audio's voice has stopped or paused. In this case, once a detected intent is received, the client should close the stream and start a new request with a new stream as needed. This means that user has to stop/pause speaking in order for you send it to Dialogflow.
In order for Dialogflow to detect a proper intent, it has to have the full user utterance.
If you are looking for real-time speech recognition, look into our Speech-to-text product (https://cloud.google.com/speech-to-text/).
Upvotes: 3