Reputation: 1929
How does Google Speech API's SingleUtterance
work? According to the docs, it is Google's way of determining when a speaker has spoken a single utterance. I understand what it does, but I would like to know how? Does the API simply wait for a certain duration of "speechless" audio? If so, how long a duration of voiceless audio will trigger the end of an utterance?
Does it have some other sort of AI algorithm that helps determine when someone has stopped speaking?
Thanks
Upvotes: 5
Views: 3919
Reputation: 2099
I don't think details are exposed, in my opinion detection of audio ending is a decision of the API. Instead, the it offers the way to identify when such decision has been made.
In normal conditions the stream will continue to listen and process audio until either the stream is closed directly, or the stream's limit length has been exceeded. In such situation single_utterance is not required to be set.
When you require it (voice commands, for example) and set single_utterance=true, the API decides when to finish recognition and sends to your client the END_OF_SINGLE_UTTERANCE event and cease recognition.
Upvotes: 2