Reputation: 1163

How to use Google's Text-to-Speech API in Python

My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL

One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them? I just want my list of strings returned as audio files.

(I put my actual key in the block above. I'm just not going to share it here.)

Upvotes: 4

Answers (3)

Loïc Sacré

Reputation: 73

If you would like to avoid using the google Python API, you can simply do this:

import requests 
import json

url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"

text = "This is a text"

data = {
        "input": {"text": text},
        "voice": {"name":  "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
        "audioConfig": {"audioEncoding": "MP3"}
      };

headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }

r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)

It is similar to what you did but you need to include your API key.

Upvotes: 3

CodeRaptor

Reputation: 361

Configure Python App for JSON file and Install Client Library

Create a Service Account
Create a Service Account Key using the Service Account here
The JSON file downloads and save it securely
Include the Google Application Credentials in your Python App
Install the library: pip install --upgrade google-cloud-texttospeech

Using Google's Python examples found: https://cloud.google.com/text-to-speech/docs/reference/libraries Note: In Google's example it is not including the name parameter correctly. and https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py

Below is the modified from the example using google app credentials and wavenet voice of a female.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"

from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")

# Build the voice request, select the language code ("en-US") 
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    name='en-US-Wavenet-C',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

Voices,Name, Language Code, SSML Gender, Etc

List of Voices: https://cloud.google.com/text-to-speech/docs/voices

In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.

voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        name='en-US-Wavenet-C',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

Upvotes: 10

Ant

Reputation: 1163

Found the answer and lost the link among 150 Google documentation pages I had open.

#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

My time consuming pursuit was to try to send the request through a JSON with Python, but this appears to be through there own modules, which works fine. Notice the default voice gender is 'neutral'.

Upvotes: 2

How to use Google&#39;s Text-to-Speech API in Python

Answers (3)

Configure Python App for JSON file and Install Client Library

Below is the modified from the example using google app credentials and wavenet voice of a female.

Voices,Name, Language Code, SSML Gender, Etc

Related Questions

How to use Google's Text-to-Speech API in Python