Reputation: 45
I am trying to use the Google Speech-to-text api from the App Engine (which does not require a credential key). However, when running the code to get the respond, I receive an empty error.
const detectspeech = async (audioBytes) => {
try {
const client = new speech.SpeechClient();
const audio = {
content: audioBytes,
};
const config = {
enableAutomaticPunctuation: true,
encoding: "LINEAR16",
model: "default",
languageCode: 'en-US',
};
const request = {
audio: audio,
config: config,
};
console.log("1");
const [response] = await client.recognize(request);
console.log("2");
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n');
return { data: "Success"};
}catch(e)
{
return {error: e};
}
}
On the log, I got the number "1" printed out, but not "2", so I would presume the result lies in the line await client.recognize(request);
. However, catching the error, I got the error with an empty field, like {}
.
That certainly doesn't help much in debugging. So can anyone help. Thanks.
Upvotes: 1
Views: 563
Reputation: 45
Okay so a lot of this has to do with me being new to nodejs. Should have log e.message
instead.
However, the core problem of the error remains, and that error is: invalid formatting.
So to anyone seeking to use Google Speech-to-text with Facebook Messenger (which is what I am doing):
Facebook Messenger will convert everything to .mp4 file. mp3 -> mp4, wav -> mp4 ... everything.
Google Speech-to-Text will NOT accept mp3, mp4 sound format. They used to AFAIK, as in their v1 RecognitionConfig, there is a MP3 format support, but their v1p1beta1 no longer has it.
If you text using their tool at their home Speech-to-Text page, you will see even mp4 works, but that does not mean the API works with mp4. Why removing support for the most common audio file type? I wish I know. This may change in the future, but for know, it just adds more work.
So what you need to do, at least what I did successfully, is to use a file conversion API, like Zamzar.
Took me a while to set up using their doc, but then again, I am new to nodejs. Basically:
Get the payload url from Facebook Messenger for the url of your voice clip.
Pass that url to Zamzar for file conversion. Select format 'wav'
Check the conversion status.
When the status is finished, get the converted file.
Encode the file to base64
Pass that to Google Speech-to-Text API, which can easily recognize 'wav' files without too much configuration.
Get the result.
Upvotes: 0
Reputation: 2205
Use
app.get('/', async(req, res) => {
res.send(await detectspeech())
Upvotes: 1