liam
liam

Reputation: 148

Edit Azure Python code to clean up Speech-to-Text output

I'm using Microsoft Azure's speech-to-text API and it's working well but the output is cumbersome and I'd like to clean it up so that only the recognized speech is displayed.

this is what the output looks like

The python snippet that azure provides is:

try:
    import azure.cognitiveservices.speech as speechsdk
import sys
sys.exit(1)

speech_key, service_region = "***", "***"

weatherfilename = os.path.join(
    os.path.dirname(__file__),
    'orf_audio_2',
    '716_anton.wav')
# def speech_recognize_once_from_file():
    """performs one-shot speech recognition with input from an audio     file"""
# <SpeechRecognitionWithFile>
speech_config = speechsdk.SpeechConfig(subscription=speech_key,     region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
# Creates a speech recognizer using a file as audio input.
# The default language is "en-us".
speech_recognizer =     speechsdk.SpeechRecognizer(speech_config=speech_config,     audio_config=audio_config)

start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Check the result
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
# </SpeechRecognitionWithFile>

Upvotes: 1

Views: 389

Answers (1)

Jay Gong
Jay Gong

Reputation: 23782

result.text in the sample code is the simplest output of recognized speech.

My test with default microphone:

enter image description here


Please refer to below fragment of code which works for me.

import azure.cognitiveservices.speech as speechsdk
import time
# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "***", "***"

weatherfilename = "D:\\whatstheweatherlike.wav"

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('\nSESSION STOPPED {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('\n{}'.format(evt.result.text)))

# print('Say a few words\n\n')
speech_recognizer.start_continuous_recognition()
time.sleep(10)
speech_recognizer.stop_continuous_recognition()

speech_recognizer.session_started.disconnect_all()
speech_recognizer.recognized.disconnect_all()
speech_recognizer.session_stopped.disconnect_all()

And the output looks like:

enter image description here

Upvotes: 1

Related Questions