Reputation: 4426
I am new to Google Cloud's Text-to-speech. The docs show the <prosody>
tag with rate
and pitch
attributes. But these do not make a difference in my requests. For example, if I use rate="slow"
or rate="fast"
, or pitch="+2st"
or pitch="-2st"
, the result is the same and different from the example on the docs, which has a slower rate and lower tone.
I ensured the latest version with:
python3 -m pip install --upgrade google-cloud-texttospeech
Minimal reproducible example:
import os
from google.cloud import texttospeech
AUDIO_CONFIG = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file"
tts_client = texttospeech.TextToSpeechClient()
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name= "en-US-Wavenet-A"
)
ssml_input = texttospeech.SynthesisInput(
ssml='<prosody rate="fast" pitch="+2st">Can you hear me now?</prosody>'
# or this one:
#ssml='<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>'
)
response = tts_client.synthesize_speech(
input=ssml_input, voice=voice, audio_config=AUDIO_CONFIG
)
with open("/tmp/cloud.wav", 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
How can I use Google Cloud's rate and pitch prosody attributes?
Upvotes: 1
Views: 431
Reputation: 1552
According to this document, when you are writing a SSML script inside Text-to-Speech code, the format for the SSML script should be like :
<speak>
<prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody>
</speak>
You can refer to the below mentioned piece of code, I tried at my end and it is working for me.
Code 1 :
I used pitch as low and rate as slow .
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
ssml= '<speak><prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody></speak>'
)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Audio output : output audio
Code 2 :
I used a rate as fast and pitch as +5st.
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
ssml= '<speak><prosody rate="fast" pitch="+5st">Hi good morning have a nice day</prosody></speak>'
)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Audio output : output audio
Upvotes: 4