Azure cognitive services text-to-speech service "whispering" style adjustments

Question

I am working on a project that requires voice over for videos. I was looking for a free/cheap option for a more natural voice synthesizer options and ran into an article suggesting using Azure TTS service. As of 1/23/2024 it is still true that Azure cognitive services text-to-speech service is free up to 0.5 million characters. Works well for what I'm doing.

I registered in Azure and created a TTS service. I chose en-US-NancyNeural as my primary voice as her "whispering" style sounded better than the others.

I would like to make the whispering voice softer than it comes by default. I figured using SSML is the correct approach for altering the TTS result. I was wondering if anyone can share their experience playing with options and making the whispering slower, softer and quieter (more natural). Though default Nancy's whispering is better than the others "she" still whispers very quickly and loudly, lol.

What works well? What does not work? Please, share your experience

Here is the sample of my TTS NodeJS function

async function generateSpeechFromText(name, text, tempDirectory) {
  console.log(`Generating speech from text for section: ${name}`)
  const audioFile = `${tempDirectory}/${name}.wav`

  const speechConfig = TTSSdk.SpeechConfig.fromSubscription(
    process.env.AZURE_TTS_KEY,
    process.env.AZURE_TTS_REGION
  )
  const audioConfig = TTSSdk.AudioConfig.fromAudioFileOutput(audioFile)
  speechConfig.speechSynthesisVoiceName = "en-US-NancyNeural"

  let synthesizer = new TTSSdk.SpeechSynthesizer(speechConfig, audioConfig)
  const ssml = `
                  
                    
                      ${text}
                    
                  
                `

  return new Promise((resolve, reject) => {
    synthesizer.speakSsmlAsync(
      ssml,
      (result) => {
        if (result.reason === TTSSdk.ResultReason.SynthesizingAudioCompleted) {
          console.log("Synthesis finished for: " + name)
          resolve(audioFile)
        } else {
          console.error(
            "Speech synthesis failed for: " + name,
            result.errorDetails
          )
          reject(result.errorDetails)
        }
        synthesizer.close()
      },
      (err) => {
        console.error("Error during synthesis for: " + name, err)
        synthesizer.close()
        reject(err)
      }
    )
  })
}

And here is the link to the page that goes over SSML structure and events

Azure cognitive services text-to-speech service "whispering" style adjustments

Answers (1)

Related Questions

Azure cognitive services text-to-speech service &quot;whispering&quot; style adjustments

Answers (1)

Related Questions

Azure cognitive services text-to-speech service "whispering" style adjustments