Xavier Taylor
Xavier Taylor

Reputation: 1463

How do I create an expressjs endpoint that uses azure tts to send audio to a web app?

I am trying to figure out how to expose an express route (ie: Get api/word/:some_word) which uses the azure tts sdk (microsoft-cognitiveservices-speech-sdk) to generate an audio version of some_word (in any format playable by a browser), and res.send()'s the resulting audio, so that a front end javascript web app could consume the api in order to play the audio pronunciation of the word.

I have the azure sdk 'working' - it is creating an 'ArrayBuffer' inside my expressjs code. However, I do not know how to send the data in this ArrayBuffer to the front end. I have been following the instructions here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=import%2Cwindowsinstall&pivots=programming-language-javascript#get-result-as-an-in-memory-stream

Another way to phrase my question would be 'in express, I have an ArrayBuffer whose contents is an .mp3/.ogg/.wav file. How do I send that file via express? Do I need to convert it into some other data type(like a Base64 encoded string? A buffer?) Do I need to set some particular response headers?

Upvotes: 0

Views: 415

Answers (1)

Xavier Taylor
Xavier Taylor

Reputation: 1463

I finally figured it out seconds after asking this question 😂

I am pretty new to this area, so any pointers on how this could be improved would be appreciated.

app.get('/api/tts/word/:word', async (req, res) => {
  const word = req.params.word;
  const subscriptionKey = azureKey;
  const serviceRegion = 'australiaeast';

  const speechConfig = sdk.SpeechConfig.fromSubscription(
    subscriptionKey as string,
    serviceRegion
  );
  
  speechConfig.speechSynthesisOutputFormat =
    SpeechSynthesisOutputFormat.Ogg24Khz16BitMonoOpus;

  const synthesizer = new sdk.SpeechSynthesizer(speechConfig);

  synthesizer.speakSsmlAsync(
    `
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN">
    <voice name="zh-CN-XiaoxiaoNeural">
            ${word}
    </voice>
    </speak>
    `,
    (resp) => {
      const audio = resp.audioData;
      synthesizer.close();
      const buffer = Buffer.from(audio);
      res.set('Content-Type', 'audio/ogg; codecs=opus; rate=24000');
      res.send(buffer);
    }
  );
});

Upvotes: 2

Related Questions