Reputation: 55
I'm trying to get Google TTS to read aloud a short set of words and pause between each word. An example of the kind of SSML I send to the Google Cloud:
<speak>chaume<break time="3s"/> cuivré, relatif au cuivre</speak>
The first word gets read, then the voice pauses for three seconds, but everything that comes after gets dropped down. I have successfully had TTS read longer sentences that contained breaks, such as this one, with identical code:
<speak>Se pure vagolavano allora per una Parma stupenda, prima dello <break time="3s"/>scempio della Bassa dei Magnani orrendamente ricostruita.</speak>
There does not seam to be any difference between the two samples, what is it that goes wrong with the first one?
My very slightly customized version of the synthesizing function is the following:
def synthesize_text(ssml_text,file_name,tts_lang,tts_voice_name):
"""Synthesizes speech from the input string of text."""
client = texttospeech.TextToSpeechClient(credentials=credentials)
input_text = texttospeech.SynthesisInput(ssml=ssml_text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.VoiceSelectionParams(
language_code=tts_lang,
name=tts_voice_name,
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)
# The response's audio_content is binary.
with open(f"{home}/Documents/{file_name}.mp3", "wb") as out:
out.write(response.audio_content)
Upvotes: 2
Views: 421
Reputation: 55
Well it proved enough to delete the following lines:
name=tts_voice_name,
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
The voice_name
was fr-FR-Standard-A
, a WaveNet voice. Whereas the language code was fr-CA
. I'm quite sure the discrepancy caused the strange behaviour.
Upvotes: 1