Azure Speech Services + Text to Speech + Silent SpeakSsmlAsync

Question

I am using Azure Text to Speech, part of the Cognitive Services.

I compose my request as SSML, and then call the function SpeakSsmlAsync.

If I choose the output format Audio24Khz160KBitRateMonoMp3, the function returns almost immediately with the speech data. But if I choose the output format Riff24Khz16BitMonoPcm, the functions plays the speech back through my speakers before returning with the speech data.

Is there a way to call Riff24Khz16BitMonoPcm silently, so that the speech data is returned but without hearing it first?

+++

Update 3rd August 2024, here is the code:

//
SpeechConfig speechConfig = SpeechConfig.FromSubscription(SubscriptionKey, SubscriptionRegion);
speechConfig.OutputFormat = OutputFormat.Detailed;
speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
speechConfig.SpeechSynthesisVoiceName = "de-DE-KatjaNeural";
speechConfig.SetProperty(PropertyId.Speech_LogFilename, LogServices.SpeechLogFilepath);

//
using (SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig))
{

    //
    string strSsml = _textToSSML(Language, speechConfig.SpeechSynthesisVoiceName, strText);

    //
    SpeechSynthesisResult speechSynthesisResult = await speechSynthesizer.SpeakSsmlAsync(strSsml);

    //
    if (speechSynthesisResult.Reason == ResultReason.SynthesizingAudioCompleted)
    {

        // Process the wav and save as mp3
        WaveFile waveFile = new WaveFile(speechSynthesisResult.AudioData);
        _processAndSave(waveFile, audioFilepaths);

    }
    else
    {

        //
        new LogServices().AddTextToSpeechError(speechSynthesisResult, strText);

    }

}

Azure Speech Services + Text to Speech + Silent SpeakSsmlAsync

Answers (1)

Related Questions