Reputation: 115
Currently am working on the language translation between 2 callers using Twilio and open ai real time, using Twilio am fetching the audio stream and pushing the audio stream to openai websocket as below.
const audioAppend = {
type: "input_audio_buffer.append",
audio: data.media.payload,
};
if (
client.callerOpenAiSocket != null &&
client.callerOpenAiSocket.readyState === WebSocket.OPEN
) {
client.callerOpenAiSocket.send(JSON.stringify(audioAppend));
} else {
//console.log("Please wait until OpenAI is intialized");
}
Coming to my open ai socket this is how I am sending the session update
this.callersessionUpdate = {
type: "session.update",
session: {
turn_detection: {
type: "server_vad",
threshold: 0.5,
prefix_padding_ms: 300,
silence_duration_ms: 500,
},
input_audio_format: "g711_ulaw",
output_audio_format: "g711_ulaw",
voice: this.voice,
instructions: this.callerPrompt,
modalities: ["text", "audio"],
temperature: 0.8,
max_response_output_tokens: 100,
input_audio_transcription: {
model: "whisper-1",
},
},
};
And the prompt I used to make the language translation is as follows:
You are an AI assistant designed to process Telugu audio. Please perform the following tasks accurately and concisely:
- Task: Listen to the provided Telugu audio and transcribe it > into written Telugu text.
- Translate: Translate the transcribed Telugu text into English.
- Output: Provide English translation clearly.
Do not include any additional information, context, or explanations. Ensure that all responses are complete and clear.
Coming to the issues that I’m facing now are:
NOTE: I’m sending the session updates for 3 seconds. Can anyone guide me out in addressing the issues what I am facing.
Upvotes: 1
Views: 95