Swimburger
Swimburger

Reputation: 7164

How to convert PCM S16LE audio to MU-LAW/8000 using .NET (Windows/Mac/Linux)

I'm trying to receive and transmit audio between a Twilio voice call and a Discord voice channel. I'm struggling to figure out how to convert the audio data I'm receiving from DSharpPlus (Discord library for .NET) to the format required by Twilio Voice.

If I'm reading DSharpPlus' docs correctly, the PCM data coming from DSharpPlus is in PCM S16LE format. Twilio expects the data to be in MU-LAW/8000 format (excluding headers I believe).

I am trying to use NAudio to convert the data, but all I'm hearing through the phone is sharp painful noises. I cannot use the full NAudio library as this project should work on Windows/Mac/Linux and some NAudio APIs are Windows only.
Here's the relevant code I currently have:

private async Task VoiceReceiveHandler(VoiceNextConnection connection, VoiceReceiveEventArgs args)
{
    
    if (twilioSocketConnectionManager.TryGetSocketById(socketId, out var twilioSocket) && twilioSocket.Socket.State == WebSocketState.Open)
    {
        var media = ConvertPcmToMulawBase64Encoded(args.AudioFormat, args.PcmData.ToArray());
        var json = JsonSerializer.Serialize<MediaMessage>
        (
            new MediaMessage("media", twilioSocket.StreamSid, new MediaPayload(media)), 
            jsonSerializerOptions
        );
        logger.LogInformation(json);
        var bytes = Encoding.Default.GetBytes(json);
        var arraySegment = new ArraySegment<byte>(bytes, 0, bytes.Length);
        await twilioSocket.Socket.SendAsync(arraySegment, WebSocketMessageType.Text, WebSocketMessageFlags.EndOfMessage, CancellationToken.None);
    }
}

private static string ConvertPcmToMulawBase64Encoded(AudioFormat audioFormat, byte[] pcmData)
{
        
    var sourceFormat = new WaveFormat(audioFormat.SampleRate, 16, audioFormat.ChannelCount);
    return Convert.ToBase64String(EncodeMuLaw(pcmData, 0, pcmData.Length));
}

public static byte[] EncodeMuLaw(byte[] data, int offset, int length)
{
    var encoded = new byte[length / 2];
    int outIndex = 0;
    for(int n = 0; n < length; n+=2)
    {
        encoded[outIndex++] = MuLawEncoder.LinearToMuLawSample(BitConverter.ToInt16(data, offset + n));
    }
    return encoded;
}

I will also need convert from MU-LAW to PCM S16LE again, but first things first.
I'm completely oblivious when it comes to audio processing, so go easy on me.

Here's the rest of the source code: https://github.com/Swimburger/DiscordTwilioVoiceBot

Essentially my question is, how do I convert PCM S16LE audio to MU-LAW/8000 using .NET while supporting Windows/Linux/Mac?


Update 1:

Folks have suggested using ffmpeg instead of NAudio, which I think I'm correctly doing here, but I still hear sharp noise instead of the actual audio.

private async Task VoiceReceiveHandler(VoiceNextConnection connection, VoiceReceiveEventArgs args)
{
    var ffmpeg = Process.Start(new ProcessStartInfo
    {
        FileName = "ffmpeg",
        Arguments = $@"-hide_banner -ac 2 -f s16le -ar 48000 -i pipe:0 -c:a pcm_mulaw -f mulaw -ar 8000 -ac 1 pipe:1",
        RedirectStandardInput = true,
        RedirectStandardOutput = true
    });

    //byte[] trimmedData = new byte[args.PcmData.Length - 44];
    //Buffer.BlockCopy(args.PcmData.ToArray(), 44, trimmedData, 0, trimmedData.Length);

    await ffmpeg.StandardInput.BaseStream.WriteAsync(args.PcmData);
    ffmpeg.StandardInput.Close();
    byte[] data;
    using(var memoryStream = new MemoryStream())
    {
        ffmpeg.StandardOutput.BaseStream.CopyTo(memoryStream);
        data = memoryStream.ToArray();
    }
    ffmpeg.Dispose();

    //byte[] trimmedData = new byte[data.Length - 44];
    //Buffer.BlockCopy(data, 44, trimmedData, 0, trimmedData.Length);

    //return;

    if (twilioSocketConnectionManager.TryGetSocketById(socketId, out var twilioSocket) && twilioSocket.Socket.State == WebSocketState.Open)
    {
        var json = JsonSerializer.Serialize<MediaMessage>
        (
            new MediaMessage("media", twilioSocket.StreamSid, new MediaPayload(Convert.ToBase64String(data))), 
            jsonSerializerOptions
        );
        logger.LogInformation(json);
        var bytes = Encoding.Default.GetBytes(json);
        var arraySegment = new ArraySegment<byte>(bytes, 0, bytes.Length);
        await twilioSocket.Socket.SendAsync(arraySegment, WebSocketMessageType.Text, WebSocketMessageFlags.EndOfMessage, CancellationToken.None);
    }
}

This is on a separate branch.

Upvotes: 4

Views: 1467

Answers (1)

Matthew Gilliard
Matthew Gilliard

Reputation: 9498

I had a similar question converting Twilio's MU-LAW to PCM 16LE to stream to Azure Cognitive Services transcription service. I was writing in Java rather than dotnet, and I did not find a good library solution.

However, the conversion can be done byte-at-a-time with a lookup table (note that one byte of mulaw is represented by 2 bytes of pcm). There is a rather abstract description of the algorithm on wikipedia, and I found this dotnet repo's code easily translatable to Java and it worked fine. For your case you will want to look in MulawDecoder.cs.

My resulting Java code for mulaw->pcm is here.

Upvotes: 1

Related Questions