dannym25
dannym25

Reputation: 1

Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.

Setup:

1. Confirmed WebSocket Receives Data

• The WebSocket successfully logs incoming audio chunks from Twilio:

🔊 Received 379 bytes of audio from Twilio
🔊 Received 379 bytes of audio from Twilio

• This suggests Twilio is sending audio data, but it's not being interpreted correctly.

2. Saving and Playing Raw Audio

• I save the incoming raw mulaw (8000Hz) audio from Twilio to a file:

fs.appendFileSync('twilio-audio.raw', message);

• Then, I convert it to a .wav file using FFmpeg:

ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav

Problem: When I play the audio using ffplay, it contains no speech, only rapid clicking sounds.

3. Ensured Correct Audio Encoding

• Twilio sends mulaw 8000Hz mono format. • Verified that my ffmpeg conversion is using the same settings. • Tried different conversion methods:

ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav

→ Same issue.

4. Checked Google Speech-to-Text Input Format

• Google STT requires proper encoding configuration:

const request = {
    config: {
        encoding: 'MULAW',
        sampleRateHertz: 8000,
        languageCode: 'en-US',
    },
    interimResults: false,
};

• No errors from Google STT, but it never detects speech, likely because the input audio is just noise.

5. Confirmed That Raw Audio is Not a WAV File

• Since Twilio sends raw audio, I checked whether I needed to strip the header before processing. • Tried manually extracting raw bytes, but the issue persists.

Current Theory:

Code Snippets:

Twilio <Stream> Setup in TwiML Response

app.post('/voice-response', (req, res) => {
    console.log("📞 Incoming call from Twilio");

    const twiml = new twilio.twiml.VoiceResponse();
    twiml.say("Hello! Welcome to the service. How can I help you?");
    
    // Prevent Twilio from hanging up too early
    twiml.pause({ length: 5 });

    twiml.connect().stream({
        url: `wss://your-ngrok-url/ws`,
        track: "inbound_track"
    });

    console.log("🛠️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);
    
    res.type('text/xml').send(twiml.toString());
});

WebSocket Server Handling Twilio Audio Stream

wss.on('connection', (ws) => {
    console.log("🔗 WebSocket Connected! Waiting for audio input...");

    ws.on('message', (message) => {
        console.log(`🔊 Received ${message.length} bytes of audio from Twilio`);

        // Save raw audio data for debugging
        fs.appendFileSync('twilio-audio.raw', message);

        // Check if audio is non-empty but contains only noise
        if (message.length < 100) {
            console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");
        }
    });

    ws.on('close', () => {
        console.log("❌ WebSocket Disconnected!");
        
        // Convert Twilio audio for debugging
        exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {
            if (err) console.error("❌ FFmpeg Conversion Error:", err);
            else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");
        });
    });

    ws.on('error', (error) => console.error("⚠️ WebSocket Error:", error));
});

Questions:

Additional Context:

Any help is greatly appreciated! 🙏

Upvotes: 0

Views: 26

Answers (0)

Related Questions