Reputation: 1762

How to use Phonetic or Phoneme pronunciation in google text to speech?

I have been trying for a while to get Phonetic or Phoneme pronunciation working with google's text to speech but have not managed to get it performing consistently.

I have managed to get limited results from using https://tophonetics.com/ It translated "The cow went mad." to "ðə kaʊ wɛnt mæd." but the 'the' 'ðə' was not audible. And when I tried "ðɪs ɪz səm fəˈnɛtɪk tɛkst ˈɪnˌpʊt".

Are there any SSML codes to define phonetic blocks of text, that can be this format "D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt" can be used instead of "ðɪs ɪz səm fəˈnɛtɪk tɛkst ˈɪnˌpʊt" "

Upvotes: 8

Answers (4)

Robin Manoli

Reputation: 2222

The main point is to use characters available to the language you are using: https://cloud.google.com/text-to-speech/docs/phonemes

Using any other character than what's listed for your language seems to cause failure.

Then there seems to be other reasons why phonemes to break.

So a good starting point is code that actually works. Then add one character at a time, and see what breaks your code. This code uses example values from the documentation - https://cloud.google.com/text-to-speech/docs/ssml#phoneme - but full code that actually works (instead of Google's useless examples).

from google.cloud import texttospeech

# don't forget GOOGLE_APPLICATION_CREDENTIALS !!!

def ssml_to_audio(ssml_text, outfile):
    client = texttospeech.TextToSpeechClient()
    synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Writes the synthetic audio to the output file.
    with open(outfile, "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + outfile)

def text_to_ssml(text):
    ssml = "<speak>{}</speak>".format(
        text.replace("\n", '\n<break time="2s"/>')
    )
    return ssml

# using ttttt to be able to hear easily if phonemes are pronounced
text = """
<phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">ttttt</phoneme>
<phoneme alphabet="x-sampa" ph='m@"hA:g@%ni:'>ttttt</phoneme>
<phoneme alphabet="ipa" ph="sɑd̪ʰənɑ">ttttt</phoneme>
<phoneme alphabet="ipa" ph="ɑt̪ʃɑryəs">ttttt</phoneme>
"""

# the weird character below even breaks here on stack overflow, so it looks like an empty string
# but what's interesting is that the character is derived from a previously documented character belonging to google tts api:
# pause_character = "\u001a"  # Google TTS short pause character

not_working_text = text + ""

ssml = text_to_ssml(text)
print(ssml)
ssml_to_audio(ssml, "test.mp3")

Upvotes: 0

kyo2023

Reputation: 1

Special Confirm For Japanese. Google Sample Code is error.

Should be:

<phoneme alphabet="yomigana" ph"^はし">端</phoneme>
<phoneme alphabet="yomigana" ph="^は!し">箸</phoneme>
<phoneme alphabet="yomigana" ph="^はし!">橋</phoneme>

doc now(ref: https://cloud.google.com/text-to-speech/docs/phonemes#japanese_yomigana):

<phoneme alphabet="yomigana" phoneme="^はし">端</phoneme>
<phoneme alphabet="yomigana" phoneme="^は!し">箸</phoneme>
<phoneme alphabet="yomigana" phoneme="^はし!">橋</phoneme>

Upvotes: 0

Lena Schimmel

Reputation: 7493

Google Text-to-Speech supports the <phoneme> tag since at least spring 2021.

However, there are a lot of potential gotchas to overcome:

The demo page filters out <phoneme> tags on the client side before they even reach the API. (It does the same with the <voice> tag as pointed out here)
As with Microsoft Azure Text-to-speech (see the other answer for details), each language only supports a limited set of phonemes ("letters") that can be used.
If you use an unsupported one, the phoneme tag is completely ignored without any warning. So the official example <phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">manitoba</phoneme> does not work with any English variant but en-US, since all others lack the "o" or "oʊ" phoneme.
It's unclear if you need to use the v1beta1 API (which I can confirm is working) or if version v1 is also ok.

Upvotes: 5

MightyCurious

Reputation: 891

There is the SSML tag <phoneme> that serves your purpose.

Unfortunately, it's currently not supported in Google Cloud Text-to-speech. The available subset of SSML tags for Google Cloud is listed in the documentation. The <phoneme> tag is not in this list. An experiment using Google Cloud's text-to-speech-demo confirms that the phonemes are ignored. The content of the tag is being read as ordinary text, as has already been remarked by @Trevor in the comments.

The <phoneme> tag is, however, being supported by Microsoft Azure Text-to-Speech and Amazon Polly. In both cases, the available phonemes are limited to those available in the language being used (see here for Azure and here for Polly). The Azure documentation isn't 100% clear about the exclusion of out-of-language phonemes, but practical experiments with the Azure Text-to-Speech demo confirm that they're not working properly. In some cases, they at least seem to be replaced by the nearest available equivalent in the language used.

Being restricted to the phonemes of one language severely limits the usefulness of the phonemes tag. E.g., you can't used the feature to embed correctly pronounced content in a second language, as the second language will usually have some phonemes that are not available in the first language. Concrete language pairs in which each language has some phonemes that are not available in the other one are English/German, Spanish/German, English/Spanish.

Upvotes: 3

How to use Phonetic or Phoneme pronunciation in google text to speech?

Answers (4)

Related Questions