Reputation: 21
I am using the Azure SpeechSynthesizer libraries in python. I have written the code that will translate some text into speech. I am finding that you need to make a get() call on the result to actually have it do any speech synthesis. But this get() call is essentially blocking.
pull_stream = speechsdk.audio.PullAudioOutputStream()
stream_config = speechsdk.audio.AudioOutputConfig(stream=pull_stream)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=stream_config)
result = speech_synthesizer.speak_text_async(text)
result.get()
del speech_synthesizer
Suppose I don't call the result.get(), I am unable to pull any data from the stream. But when I call the result.get(), it blocks for several seconds while translating the text to speech. I have run this with an AudioOutputConfig of filename to have it just save to a wave file, and the timing is about the same. So I know it is doing the same work regardless of whether I get the output as a stream or a file.
Are there any pointers on how to get this to work asynchronously so I can pull from the stream as it is translating, and not have to wait until it completes?
Upvotes: 0
Views: 646
Reputation: 21
Using Dasani's code, I was able to modify it and get it work. I had to convert PCM to WAV format before saving it out to a file. And I had a really weird hack where I needed to remove part of the buffer I get in the synthesizing callback. See the code to understand. I played around with various sizes and 46 bytes seems like the right amount.
import azure.cognitiveservices.speech as speechsdk
import time
import wave
import io
subscription_key = "<speech_key>"
region = "<speech_key>"
text = "Hello Kamali, welcome."
output_wave = "output.wav"
audio_buffer = io.BytesIO()
still_synthesizing = True
def synthesis_callback(evt):
global audio_buffer
header_offset = 46
chunk_size = len(evt.result.audio_data) - header_offset
if evt.result.reason == speechsdk.ResultReason.SynthesizingAudio:
audio_buffer.write(evt.result.audio_data[-chunk_size:])
elif evt.result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech synthesis completed.")
def completed_callback(evt):
global still_synthesizing
print("Synthesis completed")
still_synthesizing = False
pull_stream = speechsdk.audio.PullAudioOutputStream()
stream_config = speechsdk.audio.AudioOutputConfig(stream=pull_stream)
speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=region)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=stream_config)
speech_synthesizer.synthesis_started.connect(lambda evt: print("Synthesis started"))
speech_synthesizer.synthesizing.connect(synthesis_callback)
speech_synthesizer.synthesis_completed.connect(completed_callback)
result = speech_synthesizer.speak_text_async(text)
# No need to make a .ge() call
# No need to remove speech_synthesizer
# del speech_synthesizer
# Give it time to work asynchronously
while still_synthesizing == True:
time.sleep(.1)
# Save the PCM data to a WAV file
audio_buffer.seek(0)
with wave.open(output_wave, 'wb') as wav_file:
wav_file.setnchannels(1) # Mono
wav_file.setsampwidth(2) # 8-bit
wav_file.setframerate(16000) # Sample rate
wav_file.writeframes(audio_buffer.getvalue())
print(f"Audio saved to {output_wave}")
Upvotes: 0
Reputation: 3649
I tried the following code to convert text to speech using result = speech_synthesizer.speak_text_async(text).get() with a .wav file and successfully converted the text to speech.
Code :
import azure.cognitiveservices.speech as speechsdk
import threading
subscription_key = "<speech_key>"
region = "<speech_key>"
text = "Hello Kamali,welcome."
output_file = "output.wav"
def synthesis_callback(evt):
"""
Callback function to handle speech synthesis events.
"""
if evt.result.reason == speechsdk.ResultReason.SynthesizingAudio:
audio_data = evt.result.audio_data
with open(output_file, "ab") as f:
f.write(audio_data)
elif evt.result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech synthesis completed.")
pull_stream = speechsdk.audio.PullAudioOutputStream()
stream_config = speechsdk.audio.AudioOutputConfig(stream=pull_stream)
speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=region)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=stream_config)
speech_synthesizer.synthesis_started.connect(lambda evt: print("Synthesis started"))
speech_synthesizer.synthesizing.connect(synthesis_callback)
speech_synthesizer.synthesis_completed.connect(lambda evt: print("Synthesis completed"))
result = speech_synthesizer.speak_text_async(text).get()
del speech_synthesizer
print(f"Audio saved to {output_file}")
Output :
The code below successfully converted the text to speech output as follows.
C:\Users\xxxxxxxxx\kamali> python test.py
Synthesis started
Audio saved to output.wav
Upvotes: 0