Reputation: 1
I am going to implement Audio Streaming using OpenAI TTS model. It will get audio data from OpenAI TTS model as streaming and send it to frontend via WebSocket. It plays on the Frontend. Frontend is React.js and backend is Node.js.
This is frontend code.
audio_response = await openai.audio.speech.create({
model: "tts-1",
voice: "nova",
input,
response_format: "mp3",
});
// Get audio chunks from the stream and send via websocket
const stream = audio_response.body;
// Pipe the audio stream to the WebSocket in small chunks
stream.on("data", (chunk) => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(chunk); // Send audio data as binary chunks
}
});
And this is backend:
const socket = new WebSocket(...);
socket.binaryType = "blob";
// Web Audio API setup
let audioContext;
let source;
let audioBufferQueue = []; // Queue for audio chunks
socket.addEventListener("message", async (event) => {
const audioChunk = event.data;
audioBufferQueue.push(audioChunk);
// Start playing audio if not already playing
if (!source) {
await playAudioQueue();
}
});
async function playAudioQueue() {
if (!audioContext) {
audioContext = new (window.AudioContext || window.webkitAudioContext)();
}
while (audioBufferQueue.length > 0) {
const audioChunk = audioBufferQueue.shift();
// Decode audio data
const arrayBuffer = await audioChunk.arrayBuffer();
try {
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
// Play the audio buffer
source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
// Wait for the audio to finish playing
await new Promise((resolve) => {
source.onended = resolve;
source.start();
});
source = null;
} catch (err) {
console.error("Error decoding audio data:", err);
}
}
}
Now this code has error like:
Error decoding audio data: EncodingError: Unable to decode audio data
Upvotes: 0
Views: 135
Reputation: 1
I guess the code above one is backend and lower one is front-end. Are you able to get the response in the streaming from the openai? Bcz in python function call look like this:
def aduioTextStream(text):
with client.audio.speech.with_streaming_response.create(
model="tts-1", voice="alloy", input=text, response_format="pcm"
) as response:
for chunk in response.iter_bytes(chunk_size=1024):
yield chunk
And yes I am facing the same issue in front-end side. the chunks are streaming to front-end but when make it audible and decoding it the content-type show the "octet-stream". I am backend dev. and i don't have enough knowledge in the front-end. Let me know it you know the answer how i can handle it.
Upvotes: 0