Handling the response from Google Speech

Question

I have a speech-to-text app and I'm wandering a bit in the dark with how to efficiently handle the response and organize it to a transcription. I feed the transcriber function 45 second chunks like this: all_text = pool.map(transcribe, enumerate(files)). This is the response I get:

all text:  [{'idx': 0, 'text': ['users outnumber', ' future'], 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs', 'file_index': 0, 'words': [{'word': 'users', 'start_time': 0, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}, {'word': 'outnumber', 'start_time': 0, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}, {'word': 'future', 'start_time': 4, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}]}, 
{'idx': 1, 'text': ["and the sustainable energy'], 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs', 'file_index': 1, 'words': [{'word': 'and', 'start_time': 45, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}, {'word': 'the', 'start_time': 45, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}, {'word': 'sustainable', 'start_time': 45, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}, {'word': 'energy', 'start_time': 52, 'participant': 'str_MIC_Ct3G_con_O6qn4m00bs'}]}]

So here I had two 45 second chunks from Elon Musks speech. I cut most of the response to make it shorter, but as you can see, there are two chunks, with indexes 0 and 1. I'm wondering how can I get the transcription from this response based on the word starting_time value? Here I took only seconds but of course I can get nanos also. Is it ok to make another list to push all the words and then sort the list using the starting_time? That brings me into my second question: How efficient is this? If I finally have a mile long list of words and other info from multiple users, will there likely be some issues? Would there be some better way of doing this?

EDIT. This is what I tried. It works with short sessions, but the app crashes with longer ones. I wonder if it has something to do with the list getting too big?

words = []
clean_transcript = ''

for word in alternative.words:
    words.append({'word': word.word, 'start_time': word.start_time.seconds, 'participant': participant})

words.sort(key=lambda x: x['start_time'])
print('ALL WORDS: ', words)

for w in words:
    clean_transcript += w['word'] + ' '

print(clean_transcript)

Is there some obvious "don't do it like this"?

Handling the response from Google Speech

Answers (1)

Related Questions