Reputation: 193
Below is the code,
import json
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import azure.cognitiveservices.speech as speechsdk
def main(filename):
container_name="test-container"
print(filename)
blob_service_client = BlobServiceClient.from_connection_string("DefaultEndpoint")
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
with open(filename, "wb") as f:
data = blob_client.download_blob()
data.readinto(f)
speech_key, service_region = "1234567", "eastus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_input = speechsdk.audio.AudioConfig(filename=filename)
print("Audio Input:-",audio_input)
speech_config.speech_recognition_language="en-US"
speech_config.request_word_level_timestamps()
speech_config.enable_dictation()
speech_config.output_format = speechsdk.OutputFormat(1)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
print("speech_recognizer:-",speech_recognizer)
#result = speech_recognizer.recognize_once()
all_results = []
def handle_final_result(evt):
all_results.append(evt.result.text)
done = False
def stop_cb(evt):
#print('CLOSING on {}'.format(evt))
speech_recognizer.stop_continuous_recognition()
global done
done= True
#Appends the recognized text to the all_results variable.
speech_recognizer.recognized.connect(handle_final_result)
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
#while not done:
#time.sleep(.5)
print("Printing all results from speech to text:")
print(all_results)
main(filename="test.wav")
Error while calling from main function,
test.wav
Audio Input:- <azure.cognitiveservices.speech.audio.AudioConfig object at 0x00000204D72F4E88>
speech_recognizer:- <azure.cognitiveservices.speech.SpeechRecognizer object at 0x00000204D7065148>
[]
Expected Output (Output without using main function)
test.wav
Audio Input:- <azure.cognitiveservices.speech.audio.AudioConfig object at 0x00000204D72F4E88>
speech_recognizer:- <azure.cognitiveservices.speech.SpeechRecognizer object at 0x00000204D7065148>
Printing all results from speech to text:
['hi', '', '', 'Uh.', 'A good laugh.', '1487', "OK, OK, I think that's enough.", '']
The existing code works perfectly if we do not use the main function, but when i call this using main function i do not get the desired output. Please guide us in the missing part.
Upvotes: 2
Views: 2910
Reputation: 4164
As described in the article here,recognize_once_async() (the method that you re using) - this method will only detect a recognized utterance from the input starting at the beginning of detected speech until the next pause.
From my understanding, your requirement would be to met if you make use of the start_continuous_recognition().The start function will start and continue processing all utterances until you invoke the stop function.
This method has lot of events connected to it, the "recognized" event fires when speech recognition process occurs. You need to have an event handler in place to handle recognition and extract the text. You could refer to the article here for more information.
Sharing a sample snippet that makes use of the start_continuous_recognition() to convert audio to text.
import azure.cognitiveservices.speech as speechsdk
import time
import datetime
# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
speech_key, service_region = "YOURSUBSCRIPTIONKEY", "YOURREGION"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# Creates an audio configuration that points to an audio file.
# Replace with your own audio filename.
audio_filename = "sample.wav"
audio_input = speechsdk.audio.AudioConfig(filename=audio_filename)
# Creates a recognizer with the given settings
speech_config.speech_recognition_language="en-US"
speech_config.request_word_level_timestamps()
speech_config.enable_dictation()
speech_config.output_format = speechsdk.OutputFormat(1)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
#result = speech_recognizer.recognize_once()
all_results = []
#https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.recognitionresult?view=azure-python
def handle_final_result(evt):
all_results.append(evt.result.text)
done = False
def stop_cb(evt):
print('CLOSING on {}'.format(evt))
speech_recognizer.stop_continuous_recognition()
global done
done= True
#Appends the recognized text to the all_results variable.
speech_recognizer.recognized.connect(handle_final_result)
#Connect callbacks to the events fired by the speech recognizer & displays the info/status
#Ref:https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.eventsignal?view=azure-python
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
print("Printing all results:")
print(all_results)
Calling the same through a function
Encapsulated in a function and tried calling it.
Just tweaked some more and encapsulated in a function. made sure the variable "done" is accessed non locally. Pls check and let me know
import azure.cognitiveservices.speech as speechsdk
import time
import datetime
def speech_to_text():
# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
speech_key, service_region = "<>", "<>"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# Creates an audio configuration that points to an audio file.
# Replace with your own audio filename.
audio_filename = "whatstheweatherlike.wav"
audio_input = speechsdk.audio.AudioConfig(filename=audio_filename)
# Creates a recognizer with the given settings
speech_config.speech_recognition_language="en-US"
speech_config.request_word_level_timestamps()
speech_config.enable_dictation()
speech_config.output_format = speechsdk.OutputFormat(1)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
#result = speech_recognizer.recognize_once()
all_results = []
#https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.recognitionresult?view=azure-python
def handle_final_result(evt):
all_results.append(evt.result.text)
done = False
def stop_cb(evt):
print('CLOSING on {}'.format(evt))
speech_recognizer.stop_continuous_recognition()
nonlocal done
done= True
#Appends the recognized text to the all_results variable.
speech_recognizer.recognized.connect(handle_final_result)
#Connect callbacks to the events fired by the speech recognizer & displays the info/status
#Ref:https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.eventsignal?view=azure-python
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
print("Printing all results:")
print(all_results)
#calling the conversion through a function
speech_to_text()
Upvotes: 6