Reputation: 113
I'm trying to pass sound directly from a numpy array created by Coqui TTS to pyaudio to play, but failing miserably.
from TTS.api import TTS
from subprocess import call
import pyaudio
# Running a multi-speaker and multi-lingual model
# List available 🐸TTS models and choose the first one
model_name = TTS.list_models()[0]
# Init TTS
tts = TTS(model_name)
# Run TTS
text="King Charles III. King of the United Kingdom and 14 other Commonwealth realms. Prince Charles acceded to the throne on September 8 2022 upon the death of his mother, Queen Elizabeth II. He was the longest-serving heir apparent in British history and was the oldest person to assume the throne, doing so at the age of 73."
# ❗ Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
speaker=tts.speakers[0]
print(speaker)
language=tts.languages[0]
print(language)
# Text to speech with a numpy output
data = tts.tts(text, speaker=speaker, language=language)
print(data)
# Text to speech to a file
#tts.tts_to_file(text=text, speaker=speaker, file_path="output.wav")
#call(['aplay', 'output.wav'])
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 16000
play=pyaudio.PyAudio()
stream_play=play.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
output=True
)
stream_play.write(data)
stream_play.stop_stream()
stream_play.close()
play.terminate()
I have commented out the version where I record the sound to a wav file, and then play the wav file, and that works well.
However, I would like to pass the result directly to pyaudio to play rather than writing a wav file each time.
I understand that TTS creates a numpy array, but I don't know how to convert that to sound.
I don't understand the settings that pyaudio needs to read the array correctly either. I've been trying with trial and error, but no luck. And now I'm trying to understand the mechanics, but it's beyond my ken.
Any help or pointers would be much appreciated.
Rupert
Upvotes: 1
Views: 1643
Reputation: 755
I got this working with SoundDevice, which you may want to switch to.
...
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True' # to avoid duplicate clash on OpenMP package (may not be required, I haven't tidied up previous attempts)
import sounddevice as sd
...
wav = tts.tts(text, speaker, language)
sd.play(wav, samplerate=22050)
status = sd.wait() # Wait until file is done playing
I got the sample rate by trial and error. It's different for each model, afaict.
Note: I had some install conflicts setting it up, but changing numpy to v.1.23.0 fixed them, you may have to do the same.
Upvotes: 1