Reputation: 1813
I'm using the function provided in the GCS docs that allows me to transcribe text in Cloud Storage:
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=48000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=2000)
# Print the first alternative of all the consecutive results.
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
return ' '.join(result.alternatives[0].transcript for result in response.results)
By default, sample_rate_hertz
is set at 16000. I changed it to 48000, but I've been having trouble setting it any higher, such as at 64k or 96k. Is 48k is the upper range of the sample rate?
Upvotes: 0
Views: 1357
Reputation: 8178
As specified in the documentation for Cloud Speech API, 48000 Hz is indeed the upper bound supported by this API.
Sample rates between 8000 Hz and 48000 Hz are supported within the Speech API.
Therefore, in order to work with higher sample rates you will have to resample your audio files.
Let me also refer you to this other page where the basic information of features supported by Cloud Speech API can be found.
Upvotes: 1