kevin Escalante
kevin Escalante

Reputation: 1

How to implement audio streaming in React + Node using the OpenAI TTS model

I am going to implement Audio Streaming using OpenAI TTS model. It will get audio data from OpenAI TTS model as streaming and send it to frontend via WebSocket. It plays on the Frontend. Frontend is React.js and backend is Node.js.

This is frontend code.

audio_response = await openai.audio.speech.create({
  model: "tts-1",
  voice: "nova",
  input,
  response_format: "mp3",
});

// Get audio chunks from the stream and send via websocket
const stream = audio_response.body;

// Pipe the audio stream to the WebSocket in small chunks
stream.on("data", (chunk) => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(chunk); // Send audio data as binary chunks
  }
});

And this is backend:

const socket = new WebSocket(...);
socket.binaryType = "blob";

// Web Audio API setup
let audioContext;
let source;
let audioBufferQueue = []; // Queue for audio chunks

socket.addEventListener("message", async (event) => {
  const audioChunk = event.data;
  audioBufferQueue.push(audioChunk);

  // Start playing audio if not already playing
  if (!source) {
    await playAudioQueue();
  }
});

async function playAudioQueue() {
  if (!audioContext) {
    audioContext = new (window.AudioContext || window.webkitAudioContext)();
  }

  while (audioBufferQueue.length > 0) {
    const audioChunk = audioBufferQueue.shift();

    // Decode audio data
    const arrayBuffer = await audioChunk.arrayBuffer();
    try {
      const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

      // Play the audio buffer
      source = audioContext.createBufferSource();
      source.buffer = audioBuffer;
      source.connect(audioContext.destination);

      // Wait for the audio to finish playing
      await new Promise((resolve) => {
        source.onended = resolve;
        source.start();
      });

      source = null;
    } catch (err) {
      console.error("Error decoding audio data:", err);
    }
  }
}

Now this code has error like:

Error decoding audio data: EncodingError: Unable to decode audio data

Upvotes: 0

Views: 135

Answers (1)

Pankaj Jarial
Pankaj Jarial

Reputation: 1

I guess the code above one is backend and lower one is front-end. Are you able to get the response in the streaming from the openai? Bcz in python function call look like this:

    def aduioTextStream(text):
        with client.audio.speech.with_streaming_response.create(
            model="tts-1", voice="alloy", input=text, response_format="pcm"
        ) as response:
            for chunk in response.iter_bytes(chunk_size=1024):
                yield chunk

And yes I am facing the same issue in front-end side. the chunks are streaming to front-end but when make it audible and decoding it the content-type show the "octet-stream". I am backend dev. and i don't have enough knowledge in the front-end. Let me know it you know the answer how i can handle it.

Upvotes: 0

Related Questions