Possible sample rates in Google Speech-to-Text?

Question

I'm using the function provided in the GCS docs that allows me to transcribe text in Cloud Storage:

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=48000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=2000)

    # Print the first alternative of all the consecutive results.
    for result in response.results:
        print('Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))
    return ' '.join(result.alternatives[0].transcript for result in response.results)

By default, sample_rate_hertz is set at 16000. I changed it to 48000, but I've been having trouble setting it any higher, such as at 64k or 96k. Is 48k is the upper range of the sample rate?

dsesto · Accepted Answer

As specified in the documentation for Cloud Speech API, 48000 Hz is indeed the upper bound supported by this API.

Sample rates between 8000 Hz and 48000 Hz are supported within the Speech API.

Therefore, in order to work with higher sample rates you will have to resample your audio files.

Let me also refer you to this other page where the basic information of features supported by Cloud Speech API can be found.

Possible sample rates in Google Speech-to-Text?

Answers (1)

Related Questions