How to avoid audio distortion while combining 2 channels into 1

Question

I've got a Mac .net application, and all it needs to do is record audio and then write it into a WAV file.

I'm using a C# wrapper for PortAudio to record the audio. I've managed to record and write a WAV file for one-channel audio successfully and, the same for two-channel. But when I try to record two-channel audio and then write it as 1 channel audio, it becomes distorted. The combination of 2 channels to 1 is done by taking interleaved pairs of samples, adding them together, and then dividing by 2. With steps taken to try and avoid overflowing.

And yet when combining the channels, the end result becomes distorted. I'm convinced it's something basic and easy that I'm doing horrifically wrong but I can no longer spot it.

The code for recording:


        private static readonly int _sampleRate = 44100; 
        private static int _totalSamplesWritten = 0;
        private const ushort BIT_DEPTH = 16 ;

        var param = new StreamParameters
        {
            device = _indexOfDevice,
            channelCount = device.maxInputChannels > 1 ? 2 : 1,
            sampleFormat = SampleFormat.Int16,
            suggestedLatency = device.defaultLowInputLatency,
            hostApiSpecificStreamInfo = IntPtr.Zero
        };

        StreamCallbackResult CallbackStereoInput(
            IntPtr input, 
            IntPtr output, 
            uint frameCount, 
            ref StreamCallbackTimeInfo timeInfo, 
            StreamCallbackFlags statusFlags, 
            IntPtr userData
        )
        {
            var samples = new short[frameCount];
            Marshal.Copy(input, samples, 0, (int)frameCount);
            
            for (var i = 0; i < frameCount; i++)
            {
                var sampleL = samples[i];
                var overflowSafeSampleL = Convert.ToInt32(sampleL);
                var sampleR = samples[i + 1];
                var overflowSafeSampleR = Convert.ToInt32(sampleR);
                
                var combinedSample = overflowSafeSampleL + overflowSafeSampleR;
                var dividedSample = Convert.ToInt16(combinedSample / 2);
                
                _outputFileWriter.Write(dividedSample);
                i++;
            }
            
            _totalSamplesWritten += (int)frameCount;

            return StreamCallbackResult.Continue;
        }

        _stream = new PortAudioSharp.Stream(
            inParams: param, outParams: null, 
            sampleRate: _sampleRate,
            framesPerBuffer: 256,
            streamFlags: StreamFlags.ClipOff,
            callback: param.channelCount > 1 ? CallbackStereoInput : CallbackMonoInput,
            userData: IntPtr.Zero
        );

Code for Writing WAV header:

WriteWavHeader(_outputFileWriter, 1, BIT_DEPTH, _sampleRate / 2, _totalSamplesWritten);

private static void WriteWavHeader(BinaryWriter writer, ushort channelCount, ushort bitDepth, int sampleRate, int totalSampleCount)
    {
        writer.Seek(0, SeekOrigin.Begin);
        writer.Write(Encoding.ASCII.GetBytes("RIFF"));
        writer.Write((bitDepth / 8 * totalSampleCount) + 36);
        writer.Write(Encoding.ASCII.GetBytes("WAVE"));
        writer.Write(Encoding.ASCII.GetBytes("fmt "));
        writer.Write(16);
        writer.Write((ushort)1);
        writer.Write(channelCount);
        writer.Write(sampleRate);
        writer.Write(sampleRate * channelCount * bitDepth / 8);
        writer.Write((ushort)(channelCount * bitDepth / 8));
        writer.Write(bitDepth);
        writer.Write(Encoding.ASCII.GetBytes("data"));
        writer.Write(bitDepth / 8 * totalSampleCount);
    }

I don't care too much about loss of audio quality as long as you can clearly and easily understand human speech from it.

An example of the distorted audio:

https://www.dropbox.com/scl/fi/5204zxtpjkqa4ewxhph0j/Audio.wav?rlkey=9urdo4s0zqtyd3hxi1oscko9b&st=w02xj8pm&dl=0

The code above is the end point after multiple different attempts.

Currently, the sample rate we write to the WAV is half the rate which we record at, but without that change, the audio length gets halved and played back at double the speed.
The Sample rate used to be 16000, and that seemed to produce slightly better results but from everything I've read online it seems like 44100 is a better bet.
Initially, it was just static when I combined the channels, but casting the samples to32 bit while combining them seems to help avoid overflow.
I've tried different sampleFormats, and I've tried non-interleaved audio but it does not seem to make anything any better or worse.
It definitely can work nicely as recording 2 channel audio and writing it as 2 channel audio works nicely, so I'm fairly confident it's not a hardware issue.
The main reason why I'm trying to combine 2 channels of audio into 1 is that I need the final WAV file to be as small as possible. I need fine audio quality and where I can save space is a bonus, and getting 2 channels into 1 would be a massive file size slasher.

How to avoid audio distortion while combining 2 channels into 1

Answers (1)

Related Questions