Mixing audio channels

Question

I am implementing an audio channel mixer and using Viktor T. Toth's algorithm. Trying to mix two audio channel streams.

In the code, quantization_ is the byte representation of the bit depth of a channel. My mix function, takes a pointer to destination and source uint8_t buffers, mixes two channels and writes into the destination buffer. Because I am taking data in a uint8_t buffer, doing that addition, division, and multiplication operations to get the actual 8, 16 or 24-bit samples and convert them again to 8-bit.

Generally, it gives the expected output sample values. However, some samples turn out to have near 0 value as they are not supposed to be when I look the output in Audacity. In the screenshot, bottom 2 signals are two mono channels and the top one is the mixed channel. It can be seen that there are some very low values, especially in the middle.

Below, is my mix function;

void audio_mixer::mix(uint8_t* dest, const uint8_t* source)
{
    uint64_t mixed_sample = 0;
    uint64_t dest_sample = 0;
    uint64_t source_sample = 0;
    uint64_t factor = 0;

    for (int i = 0; i < channel_size_; ++i)
    {
        dest_sample = 0;
        source_sample = 0;
        factor = 1;

        for (int j = 0; j < quantization_; ++j)
        {
            dest_sample += factor * static_cast(*dest++);
            source_sample += factor * static_cast(*source++);
            factor = factor * 256;
        }

        mixed_sample = (dest_sample + source_sample) - (dest_sample * source_sample / factor);

        dest -= quantization_;

        for (int k = 0; k < quantization_; ++k)
        {
            *dest++ = static_cast(mixed_sample % 256);
            mixed_sample = mixed_sample / 256;
        }
    }
}

thomas.cloud · Accepted Answer

It seems like you aren't treating the signed audio samples correctly. The horizontal line should be zero voltage from your audio signal.

If you look at the positive voltage audio samples they obey your equation correctly (except for the peak values in the center). The negative values are being compressed which makes me feel like they are being treated as small positive voltages instead of negative voltages.

In other words, maybe those unsigned ints should be signed ints so the top bit indicates the voltage polarity and you can have audio samples in the range +127 to -128.

Those peak values in the center seem like they are wrapping around modulo 255 which would be the peak value for an unsigned byte representation of your audio. I'm not sure how this would happen but it seems related to the unsigned vs signed signals.

Maybe you should try the other formula Viktor provided in his document:

Z = 2(A+B) - (AB/128) - 256

Mixing audio channels

Answers (1)

Related Questions