Reputation: 211
I very simply want to pass is a tiny audio clip (8Khz telephony) containing a single digit number, and get back a single digit number as text, narrowed down to a number.
File in > number as text out. Preferably via the python command line API.
The problem is, by default, it recognises things like 1,2,3,4,5 as won,too,free,fore,5 ... no good!
I believe I want what is called a grammar? Or something like Amazon's number slot types it uses in Alexa? I've looked over the cloud speech docs and can't find it. The only thing I could think of is looping over the alternatives given and see if any match an int rather than a word. And if none do, then what?
Thanks.
Upvotes: 1
Views: 1362
Reputation: 211
A.Queue's answer is correct, however, in case others are bitten by the docs:
The link given suggests:
{ "phrases": [ string], }
The python documentation says:
speech_contexts
Optional: A means to provide context to assist the speech recognition.
The python examples show:
language_code='en-US',
max_alternatives=max_alternatives,
profanity_filter=True,
speech_contexts=['Google', 'cloud'],
What actually works is:
speech_contexts=[speech.types.SpeechContext(
phrases=['Google', 'cloud'],
)]
I managed to get this from a Googler on Slack who pointed me to some alternative more comprehensive and accurate documentation. Bookmark that last link for future sanity.
Upvotes: 4
Reputation: 1572
Try adding speechContexts. You can then add a few phrases that you think are most probable.
Upvotes: 2