Google Cloud's rate and pitch prosody attributes

Question

I am new to Google Cloud's Text-to-speech. The docs show the tag with rate and pitch attributes. But these do not make a difference in my requests. For example, if I use rate="slow" or rate="fast", or pitch="+2st" or pitch="-2st", the result is the same and different from the example on the docs, which has a slower rate and lower tone.

I ensured the latest version with:

python3 -m pip install --upgrade google-cloud-texttospeech

Minimal reproducible example:

import os

from google.cloud import texttospeech

AUDIO_CONFIG = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file"

tts_client = texttospeech.TextToSpeechClient()
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name=  "en-US-Wavenet-A"
)

ssml_input = texttospeech.SynthesisInput(
    ssml='Can you hear me now?'
    # or this one:
    #ssml='Can you hear me now?'
)

response = tts_client.synthesize_speech(
    input=ssml_input, voice=voice, audio_config=AUDIO_CONFIG
)

with open("/tmp/cloud.wav", 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)

How can I use Google Cloud's rate and pitch prosody attributes?

Sandeep Mohanty · Accepted Answer

According to this document, when you are writing a SSML script inside Text-to-Speech code, the format for the SSML script should be like :



    Hi good morning have a nice day

You can refer to the below mentioned piece of code, I tried at my end and it is working for me.

Code 1 :

I used pitch as low and rate as slow .


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= 'Hi good morning have a nice day'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

Audio output : output audio

Code 2 :

I used a rate as fast and pitch as +5st.


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= 'Hi good morning have a nice day'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

Audio output : output audio

Google Cloud's rate and pitch prosody attributes

Answers (1)

Related Questions

Google Cloud&#39;s rate and pitch prosody attributes

Answers (1)

Related Questions

Google Cloud's rate and pitch prosody attributes