Nguyễn Thu Phương
Nguyễn Thu Phương

Reputation: 45

Empty Error when using Google Cloud Speech-to-text

I am trying to use the Google Speech-to-text api from the App Engine (which does not require a credential key). However, when running the code to get the respond, I receive an empty error.

const detectspeech =  async (audioBytes) => {
    try {
        const client = new speech.SpeechClient();
        const audio = {
            content: audioBytes,
        };
        const config = {
            enableAutomaticPunctuation: true,
            encoding: "LINEAR16",
            model: "default",
            languageCode: 'en-US',
        };
        const request = {
            audio: audio,
            config: config,
        };
        console.log("1");
        const [response] = await client.recognize(request);
        console.log("2");
        const transcription = response.results
            .map(result => result.alternatives[0].transcript)
            .join('\n');
        return { data: "Success"};

    }catch(e)
    {
        return {error: e};
    }

}

On the log, I got the number "1" printed out, but not "2", so I would presume the result lies in the line await client.recognize(request);. However, catching the error, I got the error with an empty field, like {}.

That certainly doesn't help much in debugging. So can anyone help. Thanks.

Upvotes: 1

Views: 563

Answers (2)

Nguyễn Thu Phương
Nguyễn Thu Phương

Reputation: 45

Okay so a lot of this has to do with me being new to nodejs. Should have log e.message instead.

However, the core problem of the error remains, and that error is: invalid formatting.

So to anyone seeking to use Google Speech-to-text with Facebook Messenger (which is what I am doing):

  • Facebook Messenger will convert everything to .mp4 file. mp3 -> mp4, wav -> mp4 ... everything.

  • Google Speech-to-Text will NOT accept mp3, mp4 sound format. They used to AFAIK, as in their v1 RecognitionConfig, there is a MP3 format support, but their v1p1beta1 no longer has it.

  • If you text using their tool at their home Speech-to-Text page, you will see even mp4 works, but that does not mean the API works with mp4. Why removing support for the most common audio file type? I wish I know. This may change in the future, but for know, it just adds more work.

So what you need to do, at least what I did successfully, is to use a file conversion API, like Zamzar.

Took me a while to set up using their doc, but then again, I am new to nodejs. Basically:

  • Get the payload url from Facebook Messenger for the url of your voice clip.

  • Pass that url to Zamzar for file conversion. Select format 'wav'

  • Check the conversion status.

  • When the status is finished, get the converted file.

  • Encode the file to base64

  • Pass that to Google Speech-to-Text API, which can easily recognize 'wav' files without too much configuration.

  • Get the result.

Upvotes: 0

Cloud Ace Wenyuan Jiang
Cloud Ace Wenyuan Jiang

Reputation: 2205

Use

app.get('/', async(req, res) => {
  res.send(await detectspeech())

Upvotes: 1

Related Questions