talkingnews
talkingnews

Reputation: 211

Google cloud speech to text grammar to narrow results to a number?

I very simply want to pass is a tiny audio clip (8Khz telephony) containing a single digit number, and get back a single digit number as text, narrowed down to a number.

File in > number as text out. Preferably via the python command line API.

The problem is, by default, it recognises things like 1,2,3,4,5 as won,too,free,fore,5 ... no good!

I believe I want what is called a grammar? Or something like Amazon's number slot types it uses in Alexa? I've looked over the cloud speech docs and can't find it. The only thing I could think of is looping over the alternatives given and see if any match an int rather than a word. And if none do, then what?

Thanks.

Upvotes: 1

Views: 1362

Answers (2)

talkingnews
talkingnews

Reputation: 211

A.Queue's answer is correct, however, in case others are bitten by the docs:

The link given suggests:

{ "phrases": [ string], } 

The python documentation says:

speech_contexts

Optional: A means to provide context to assist the speech recognition.

The python examples show:

language_code='en-US',
max_alternatives=max_alternatives,
profanity_filter=True,
speech_contexts=['Google', 'cloud'],

What actually works is:

speech_contexts=[speech.types.SpeechContext(
     phrases=['Google', 'cloud'],
 )]

I managed to get this from a Googler on Slack who pointed me to some alternative more comprehensive and accurate documentation. Bookmark that last link for future sanity.

Upvotes: 4

A.Queue
A.Queue

Reputation: 1572

Try adding speechContexts. You can then add a few phrases that you think are most probable.

Upvotes: 2

Related Questions