Reputation: 1143
My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL
One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them? I just want my list of strings returned as audio files.
(I put my actual key in the block above. I'm just not going to share it here.)
Upvotes: 4
Views: 15083
Reputation: 73
If you would like to avoid using the google Python API, you can simply do this:
import requests
import json
url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"
text = "This is a text"
data = {
"input": {"text": text},
"voice": {"name": "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
"audioConfig": {"audioEncoding": "MP3"}
};
headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }
r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)
It is similar to what you did but you need to include your API key.
Upvotes: 3
Reputation: 361
pip install --upgrade google-cloud-texttospeech
Using Google's Python examples found: https://cloud.google.com/text-to-speech/docs/reference/libraries Note: In Google's example it is not including the name parameter correctly. and https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")
# Build the voice request, select the language code ("en-US")
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
List of Voices: https://cloud.google.com/text-to-speech/docs/voices
In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
Upvotes: 10
Reputation: 1143
Found the answer and lost the link among 150 Google documentation pages I had open.
#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
My time consuming pursuit was to try to send the request through a JSON with Python, but this appears to be through there own modules, which works fine. Notice the default voice gender is 'neutral'.
Upvotes: 2