Reputation: 389
I am using Azure Text to Speech, part of the Cognitive Services.
I compose my request as SSML, and then call the function SpeakSsmlAsync.
If I choose the output format Audio24Khz160KBitRateMonoMp3, the function returns almost immediately with the speech data. But if I choose the output format Riff24Khz16BitMonoPcm, the functions plays the speech back through my speakers before returning with the speech data.
Is there a way to call Riff24Khz16BitMonoPcm silently, so that the speech data is returned but without hearing it first?
+++
Update 3rd August 2024, here is the code:
//
SpeechConfig speechConfig = SpeechConfig.FromSubscription(SubscriptionKey, SubscriptionRegion);
speechConfig.OutputFormat = OutputFormat.Detailed;
speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
speechConfig.SpeechSynthesisVoiceName = "de-DE-KatjaNeural";
speechConfig.SetProperty(PropertyId.Speech_LogFilename, LogServices.SpeechLogFilepath);
//
using (SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig))
{
//
string strSsml = _textToSSML(Language, speechConfig.SpeechSynthesisVoiceName, strText);
//
SpeechSynthesisResult speechSynthesisResult = await speechSynthesizer.SpeakSsmlAsync(strSsml);
//
if (speechSynthesisResult.Reason == ResultReason.SynthesizingAudioCompleted)
{
// Process the wav and save as mp3
WaveFile waveFile = new WaveFile(speechSynthesisResult.AudioData);
_processAndSave(waveFile, audioFilepaths);
}
else
{
//
new LogServices().AddTextToSpeechError(speechSynthesisResult, strText);
}
}
Upvotes: 0
Views: 354
Reputation: 3649
The below code is worked for me by using audioDataStream to convert text to speech then save the audio to a .wav file without hearing it first and writes that data to a .mp3 file with Riff24Khz16BitMonoPcm.
Code :
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using NAudio.Wave;
using NAudio.Lame;
public class TextToSpeechService
{
private string SubscriptionKey = "<speech_key>";
private string SubscriptionRegion = "<speech_region>";
public async Task GenerateSpeechAsync(string text, string outputFilePathWav, string outputFilePathMp3)
{
var speechConfig = SpeechConfig.FromSubscription(SubscriptionKey, SubscriptionRegion);
speechConfig.SpeechSynthesisVoiceName = "de-DE-KatjaNeural";
speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
using var stream = AudioOutputStream.CreatePullStream();
var audioConfig = AudioConfig.FromStreamOutput(stream);
using var synthesizer = new SpeechSynthesizer(speechConfig, audioConfig);
string ssml = _textToSSML("de-DE", speechConfig.SpeechSynthesisVoiceName, text);
var result = await synthesizer.SpeakSsmlAsync(ssml);
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
var directory = Path.GetDirectoryName(outputFilePathWav);
if (!string.IsNullOrEmpty(directory) && !Directory.Exists(directory))
{
Directory.CreateDirectory(directory);
}
using var audioDataStream = AudioDataStream.FromResult(result);
await audioDataStream.SaveToWaveFileAsync(outputFilePathWav);
ConvertWavToMp3(outputFilePathWav, outputFilePathMp3);
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"Error synthesizing audio: {cancellation.Reason}");
Console.WriteLine($"Error details: {cancellation.ErrorDetails}");
}
}
private string _textToSSML(string language, string voice, string text)
{
return $"<speak version='1.0' xml:lang='{language}'><voice name='{voice}'>{text}</voice></speak>";
}
private void ConvertWavToMp3(string wavFilePath, string mp3FilePath)
{
using var reader = new WaveFileReader(wavFilePath);
using var writer = new LameMP3FileWriter(mp3FilePath, reader.WaveFormat, LAMEPreset.STANDARD);
reader.CopyTo(writer);
}
}
class Program
{
static async Task Main(string[] args)
{
var ttsService = new TextToSpeechService();
string text = "Hallo, wie geht es Ihnen?";
string outputFilePathWav = @"C:\Users\kamali\source\repos\ConsoleApp1\output.wav";
string outputFilePathMp3 = @"C:\Users\kamali\source\repos\ConsoleApp1\output.mp3";
await ttsService.GenerateSpeechAsync(text, outputFilePathWav, outputFilePathMp3);
Console.WriteLine("Speech synthesis completed. Audio saved to " + outputFilePathMp3);
}
}
Output :
The following text-to-speech code ran successfully and the audio was saved to a .mp3 file.
The .mp3 file was saved to the below file path.
Upvotes: 0