emonigma
emonigma

Reputation: 4426

Google Cloud's rate and pitch prosody attributes

I am new to Google Cloud's Text-to-speech. The docs show the <prosody> tag with rate and pitch attributes. But these do not make a difference in my requests. For example, if I use rate="slow" or rate="fast", or pitch="+2st" or pitch="-2st", the result is the same and different from the example on the docs, which has a slower rate and lower tone.

I ensured the latest version with:

python3 -m pip install --upgrade google-cloud-texttospeech

Minimal reproducible example:

import os

from google.cloud import texttospeech

AUDIO_CONFIG = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file"

tts_client = texttospeech.TextToSpeechClient()
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name=  "en-US-Wavenet-A"
)

ssml_input = texttospeech.SynthesisInput(
    ssml='<prosody rate="fast" pitch="+2st">Can you hear me now?</prosody>'
    # or this one:
    #ssml='<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>'
)

response = tts_client.synthesize_speech(
    input=ssml_input, voice=voice, audio_config=AUDIO_CONFIG
)

with open("/tmp/cloud.wav", 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)

How can I use Google Cloud's rate and pitch prosody attributes?

Upvotes: 1

Views: 431

Answers (1)

Sandeep Mohanty
Sandeep Mohanty

Reputation: 1552

According to this document, when you are writing a SSML script inside Text-to-Speech code, the format for the SSML script should be like :

<speak>

    <prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody>

</speak>

You can refer to the below mentioned piece of code, I tried at my end and it is working for me.

Code 1 :

I used pitch as low and rate as slow .


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= '<speak><prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody></speak>'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

Audio output : output audio

Code 2 :

I used a rate as fast and pitch as +5st.


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= '<speak><prosody rate="fast" pitch="+5st">Hi good morning have a nice day</prosody></speak>'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

Audio output : output audio

Upvotes: 4

Related Questions