Tim
Tim

Reputation: 1

Google Speech API transcribe email

I'm having trouble transcribing email using Google Speech REST API. The best I can get is most of the email address, however Google Speech ignores "dot" and "dot com". For example [email protected] returns "First Last at gmail". If I say "period" instead of "dot" I at least get "First. Last at gmail." I'm using the following:

{
  "config": {
      "encoding": "MULAW",
      "sampleRateHertz": 8000,
      "languageCode": "en-US",
      "maxAlternatives": 0,
      "profanityFilter": true,
      "enableWordTimeOffsets": false,
      "model": "phone_call",
      "useEnhanced": true
  },
  "audio": {
      "content":"&&NameBase64&&"
  }
}

I've tried add "dot" as a speech context with no changes. ".", ".com", "com", and "kom" also didn't change the results.

{
  "config": {
      "encoding": "MULAW",
      "sampleRateHertz": 8000,
      "languageCode": "en-US",
      "maxAlternatives": 1,
      "profanityFilter": true,
      "enableWordTimeOffsets": false,
      "model": "phone_call",
      "useEnhanced": true,
      "speechContexts": [{
        "phrases": ["dot"],
        }],
  },
  "audio": {
      "content":"Base64Recording"
  }
}

I've tried adding alphanumberic speech contexts and spelling it out but the results were pretty bad.

Any thoughts on how I can get "." or "dot" and "com" to show up in the transcription would be greatly appreciated.

Upvotes: 0

Views: 276

Answers (1)

Subhash Peshwa
Subhash Peshwa

Reputation: 11

Have you tried providing a boost value for the phrase? I'm facing the same issue and I noticed that increasing the boost value helped in identifying the word "dot". Boost values are usually between 0 and 20, but applying anything above 10 helped in recognizing the "dot".

Here's an example:-

  "config": {
      "encoding": "MULAW",
      "sampleRateHertz": 8000,
      "languageCode": "en-US",
      "maxAlternatives": 1,
      "profanityFilter": true,
      "enableWordTimeOffsets": false,
      "model": "phone_call",
      "useEnhanced": true,
      "speechContexts": [{
        "phrases": ["dot"],
        "boots": 15.0
        }],
  },
  "audio": {
      "content":"Base64Recording"
  }
}

You can also have multiple key value pairs in the context, each with different boost values. For example, this is what I use to detect email addresses:-

[{
    phrases: ["$OOV_CLASS_ALPHANUMERIC_SEQUENCE"],
    boost: 14.0
},
{
    phrases: ["gmail.com","yahoo.com","aol.com","outlook.com"],
    boost:5.0
},
{
    phrases: ["com",".","c o m",".com","dotcom","dot com","dot","at","at the rate","@"],
    boost: 10.0
},
{
    phrases: ["org","io","dot org","dot io","gov","dot gov","net","dot net","co","dot co"],
    boost:8.0
},
{
    phrases: ["$OOV_CLASS_DIGIT_SEQUENCE","8","naught","z","zed","zee","zz","d","aa","ae","ee","oo","ii","ay","eh","ahh","ah","ze","dee",
"1","2","3","4","5","6","7","8","9","0","zero",],
    boost: -20.0
}
]

Notice, the phrases with negative boost values will help weed out words that are often misunderstood.

Upvotes: 1

Related Questions