Gablo Ficazzo
Gablo Ficazzo

Reputation: 11

GoogleCloud Speech2Text "long_running_recognize" response object un-iterable

When running a speech to text api request from Google cloud services (over 60s audio so i need to use the long_running_recognize function, as well as retrieve the audio from a Cloud Storage Bucket), i properly get a text response, but i cannot iterate through the LongRunningResponse object that is returned, which renders the info inside semi useless.

When using just the "client.recognize()" function, i get a similar response to the long running response, except when i check for the results in the short form, i can iterate through the object just fine, contrary to the long response.

I run nearly identical parameters through each recognize function (a 1m40s long audio for long running, and a 30s for the short recognize, both from my cloud bucket).

short_response = client.recognize(config=config, audio=audio_uri)

subs_list = []
for result in short_response.results:
    for alternative in result.alternatives:
         for word in alternative.words:
               if not word.start_time:
                   start = 0
               else:
                   start = word.start_time.total_seconds()
               end = word.end_time.total_seconds()
               t = word.word
               subs_list.append( ((float(start),float(end)), t) )

    print(subs_list)

Above function works fine, the ".results" attribute correctly returns objects that i can further gain attributes from and iterate through. I use the for loops to create subtitles for a video. I then try a similar thing on the long_running_recognize, and get this:

long_response = client.long_running_recognize(config=config, audio=audio_uri)

#1
print(long_response.results)

#2
print(long_response.result())

Output from #1 returns error: AttributeError: 'Operation' object has no attribute 'results'. Did you mean: 'result'?

Output from #2 returns the info i need, but when checking "type(long_response.result())" i get: <class 'google.cloud.speech_v1.types.cloud_speech.LongRunningRecognizeResponse'>

Which i suppose is not an iterable object, and i cannot figure out how to apply a similar process as i do to the recognize function to gain subtitles the way i need.

Upvotes: 1

Views: 125

Answers (1)

vizsatiz
vizsatiz

Reputation: 2183

It took me some time to figure out, you will have to parse the response yourself. Here is the code:

def serialize_response(response):
  result_dict = {
    "results": []
   }

  for result in response.results:
    # Each result contains a list of alternatives
    alternatives = [
        {
            "transcript": alternative.transcript,
            "confidence": alternative.confidence,
            "words": [{
                "start_time": word.start_time.seconds,
                "end_time": word.end_time.seconds,
                "word": word.word,
                "speaker_tag": word.speaker_tag
            } for word in alternative.words]
        } for alternative in result.alternatives
    ]
    
    result_dict["results"].append({
        "alternatives": alternatives
    })

return result_dict

Upvotes: 1

Related Questions