Reputation: 311
I would like to see the accuracy of the speech services from Azure, specifically speech-to-text using an audio file.
I have been reading the documentation https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/?view=azure-python and playing around with a suggested code from the MS quickstar page. The code workds fine and I can get some transcription, but it just transcribes the beginning of the audio (first utterance):
import azure.cognitiveservices.speech as speechsdk
speechKey = 'xxx'
service_region = 'westus'
speech_config = speechsdk.SpeechConfig(subscription=speechKey, region=service_region, speech_recognition_language="es-MX")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=False, filename='lala.wav')
sr = speechsdk.SpeechRecognizer(speech_config, audio_config)
es = speechsdk.EventSignal(sr.recognized, sr.recognized)
result = sr.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
Based on the documentation, looks like I have to use signals and events to capture the full audio using method start_continuous_recognition (which is not documented for python, but looks like the method and related classes are implemented). I tried to follow other examples from c# and Java but was not able to implement this in Python.
Has anyone been able to do this and provie some pointers? Thank you very much!
Upvotes: 5
Views: 9097
Reputation: 256
And to further improve @manyways solutions, here is how to collect the data.
all_results = []
def handle_final_result(evt):
all_results.append(evt.result.text)
speech_recognizer.recognized.connect(handle_final_result) # to collect data at the end
Upvotes: 3
Reputation: 81
and to further assist with @David Beauchemin's solution, the following code block worked for me to get the final result in a neat list:
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING:{}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED:{}'.format(evt)))
all_results = []
def handle_final_result(evt):
all_results.append(evt.result.text)
speech_recognizer.recognized.connect(handle_final_result)
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED:{}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
Upvotes: 1
Reputation: 4736
Check the Azure python sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py
Or other language samples: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples
Basically, the below:
def speech_recognize_continuous_from_file():
"""performs continuous speech recognition with input from an audio file"""
# <SpeechContinuousRecognitionWithFile>
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
def stop_cb(evt):
"""callback that stops continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
speech_recognizer.stop_continuous_recognition()
nonlocal done
done = True
# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
# </SpeechContinuousRecognitionWithFile>
Upvotes: 3
Reputation: 160
You could try this:
import azure.cognitiveservices.speech as speechsdk
import time
speech_key, service_region = "xyz", "WestEurope"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region, speech_recognition_language="it-IT")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('\nSESSION STOPPED {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('\n{}'.format(evt.result.text)))
print('Say a few words\n\n')
speech_recognizer.start_continuous_recognition()
time.sleep(10)
speech_recognizer.stop_continuous_recognition()
speech_recognizer.session_started.disconnect_all()
speech_recognizer.recognized.disconnect_all()
speech_recognizer.session_stopped.disconnect_all()
Remember to set your preferred language. It's not too much but it's a good starting point, and it works. I will continue experimenting.
Upvotes: 1