Reputation: 1
I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.
Setup:
1. Confirmed WebSocket Receives Data
• The WebSocket successfully logs incoming audio chunks from Twilio:
🔊 Received 379 bytes of audio from Twilio
🔊 Received 379 bytes of audio from Twilio
• This suggests Twilio is sending audio data, but it's not being interpreted correctly.
2. Saving and Playing Raw Audio
• I save the incoming raw mulaw (8000Hz) audio from Twilio to a file:
fs.appendFileSync('twilio-audio.raw', message);
• Then, I convert it to a .wav
file using FFmpeg:
ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav
• Problem: When I play the audio using ffplay
, it contains no speech, only rapid clicking sounds.
3. Ensured Correct Audio Encoding
• Twilio sends mulaw 8000Hz mono format.
• Verified that my ffmpeg
conversion is using the same settings.
• Tried different conversion methods:
ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav
→ Same issue.
4. Checked Google Speech-to-Text Input Format
• Google STT requires proper encoding configuration:
const request = {
config: {
encoding: 'MULAW',
sampleRateHertz: 8000,
languageCode: 'en-US',
},
interimResults: false,
};
• No errors from Google STT, but it never detects speech, likely because the input audio is just noise.
5. Confirmed That Raw Audio is Not a WAV File
• Since Twilio sends raw audio, I checked whether I needed to strip the header before processing. • Tried manually extracting raw bytes, but the issue persists.
Current Theory:
<Stream>
tag expects a WebSocket connection starting with wss://
instead of https://
, and switching to wss://
partially fixed some previous connection issues.Code Snippets:
Twilio <Stream> Setup in TwiML Response
app.post('/voice-response', (req, res) => {
console.log("📞 Incoming call from Twilio");
const twiml = new twilio.twiml.VoiceResponse();
twiml.say("Hello! Welcome to the service. How can I help you?");
// Prevent Twilio from hanging up too early
twiml.pause({ length: 5 });
twiml.connect().stream({
url: `wss://your-ngrok-url/ws`,
track: "inbound_track"
});
console.log("🛠️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);
res.type('text/xml').send(twiml.toString());
});
WebSocket Server Handling Twilio Audio Stream
wss.on('connection', (ws) => {
console.log("🔗 WebSocket Connected! Waiting for audio input...");
ws.on('message', (message) => {
console.log(`🔊 Received ${message.length} bytes of audio from Twilio`);
// Save raw audio data for debugging
fs.appendFileSync('twilio-audio.raw', message);
// Check if audio is non-empty but contains only noise
if (message.length < 100) {
console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");
}
});
ws.on('close', () => {
console.log("❌ WebSocket Disconnected!");
// Convert Twilio audio for debugging
exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {
if (err) console.error("❌ FFmpeg Conversion Error:", err);
else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");
});
});
ws.on('error', (error) => console.error("⚠️ WebSocket Error:", error));
});
Questions:
mulaw
format when streaming audio over WebSockets?Additional Context:
<Stream>
is connected and receiving data (confirmed by logs).Any help is greatly appreciated! 🙏
Upvotes: 0
Views: 26