Reputation: 361
Can the Google Speech API be configured to only return numbers and letters, as opposed to full words?
The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Google may return "Em 1 Be 0 Are 3"
We have tried:
speechContexts
and feeding in letters A - Z, as individual phrases. This improved the accuracy for us. We did not have much success passing in individual numbers (ex 1, 2, 3). encoding
and sampleRateHertz
configuration options. We saw no improvement in doing this as we believe Google already does a great job of auto-recognizing the the sample rate and encoding.Our audio file is 8000hz and encoded with "M-ULAW". We have no flexibility in changing the sample rate or encoding.
Is there a way to get a more accurate response from Google for this use case? Even ideas for better speechContexts
phrases are welcome.
Thank you
Upvotes: 4
Views: 2082
Reputation: 11
We are experiencing the same results, we would love to have a syntax based "context" suggestion or a parameter to force only digit return variable.
Changes in api version isn't fixing the way the digits are recognised, not even using model: phone_call.
What actually was better for recognising some kind of numbers, was to switch to en_US locale and that in turn forced the recognition engine to identify a list of numbers as a phone. So it was returned in phone-like syntax with +XXX-XXX-XXX-XXXX and this made detection really really good.
So I don't understand why Google has syntax matching behind the curtains and doesn't make it available through their api.
Upvotes: 1