Vikash Patel
Vikash Patel

Reputation: 71

Google Cloud Speech-to-Text (MP3 to text)

I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.

const speech = require('@google-cloud/speech');

const config = {
  encoding: 'LINEAR16',  //AMR, AMR_WB, LINEAR16(for wav)
  sampleRateHertz: 16000,  //16000 giving blank result.
  languageCode: 'en-US'
};

Upvotes: 7

Views: 11349

Answers (4)

bob tian
bob tian

Reputation: 11

now, the mp3 type for spedch-to-text,only available in module speech_v1p1beta1 ,you must post your request for this module,and you will get what you want. the encoding: 'MP3' python example like this:

from google.cloud import speech_v1p1beta1 as speech
import io
import base64

client = speech.SpeechClient()
speech_file = "your mp3 file path"
with io.open(speech_file, "rb") as audio_file:
    content = (audio_file.read())

audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.MP3,
    sample_rate_hertz=44100,
    language_code="en-US",
)

response = client.recognize(config=config, audio=audio)

# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
print(response)
for result in response.results:
    # The first alternative is the most likely one for this portion.
    print(u"Transcript: {}".format(result.alternatives[0].transcript))

result

Upvotes: 1

Grokify
Grokify

Reputation: 16324

MP3 is now supported in beta:

MP3 Only available as beta. See RecognitionConfig reference for details.

MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.

You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:

To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:

RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8

Upvotes: 8

Pic Mickael
Pic Mickael

Reputation: 1274

According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),

Only the following formats are supported:

  • FLAC
  • LINEAR16
  • MULAW
  • AMR
  • AMR_WB
  • OGG_OPUS
  • SPEEX_WITH_HEADER_BYTE

Anything else will be rejected.

Your best bet is to convert the MP3 file to either:

Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.

Upvotes: 3

Rejo Chandran
Rejo Chandran

Reputation: 609

I had the same issue and resolved it by converting it to FLAC.

Try converting your audio to FLAC and use

encoding: 'FLAC',

For conversion, you can use sox ref: https://www.npmjs.com/package/sox

Upvotes: 2

Related Questions