user2982126
user2982126

Reputation: 11

Getting the duration of the audio file which is converting to text

Is there any way to get the duration in seconds for the audio file that we are converting to text? We could see a "totalBilledTime" in the response body. Can we consider this as the duration of the audio? Also is there any limit in size or time(duration) for the audio file which is used to convert to text?

Upvotes: 1

Views: 942

Answers (1)

RJC
RJC

Reputation: 1338

By following the quickstart guide, and using the speech recognize request; I created a sample python code:

  1. Make sure that pip install google-cloud-speech is installed in the Cloud Shell. Just a note, Before installing the library, make sure that the environment for Python development is prepared.

  2. Create the python code that will convert the speech into text.

speech-to-text.py

# Imports the Google Cloud client library
from google.cloud import speech

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
gcs_uri = "gs://cloud-samples-data/speech/Google_Gnome.wav"
# Google_Gnome.wav is 55 secs in total

audio = speech.RecognitionAudio(uri=gcs_uri)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
    enable_word_time_offsets=True,
)

# Detects speech in the audio file
response = client.recognize(config=config, audio=audio)
# By printing the response, this will show the transcribed audio file, by removing the #, this will show the whole transcribed audio file
# print(response)
# last_word will show the last word that was in the audio file
last_word = response.results[-1].alternatives[-1].words[-1]
print("Last Word: ", last_word.word)
print("Last Word End Time: ", last_word.end_time)

By setting the value of enable_word_time_offsets to true, the top result includes a list of words and the start and end time offsets (timestamps) for those words. If false, no word-level time offset information is returned. The default is false. This was stated in the RecognitionConfig documentation.

After running the speech-to-text.py file, this will show the last word and its end time of the transcribed audio file:

Last word:  return
Last word end time:  0:00:55.400000

There's current request limits for the usage of Speech-to-Text API, and was stated in this documentation.

Upvotes: 1

Related Questions