Reputation: 337

How to mix voice audio

I am currently developing a simple VoIP project where multiple clients send out his voice to a server and later the server will mix up those voices together.

However, I can't mix it directly by using simple mathematic addition. Each cycle, a client will send 3584 Bytes voice data to the mixer.

Below is the snippet of the value contained in a receiver buffer:

BYTE buffer[3584];

    [0] 0        unsigned char
    [1] 192 'À'  unsigned char
    [2] 176 '°'  unsigned char
    [3] 61 '='   unsigned char
    [4] 0        unsigned char
    [5] 80 'P'   unsigned char
    [6] 172 '¬'  unsigned char
    [7] 61 '='   unsigned char
    [8] 0        unsigned char
    [9] 144 ''    unsigned char
    [10]    183 '·' unsigned char
    [11]    61 '='  unsigned char
     .
     .
     .

I'm not so sure how the pattern inside the buffer is generated in that way from a client side but I'm thinking it may be a wave pattern. Now let say I have another similar data like this, how do I mix the voice together.

Please help. Thank you.

Upvotes: 2

Answers (4)

Jasper Bekkers

Reputation: 6809

I looked at your data again and they appear to be floating point values the reason I was mistaken in my previous post is probably related to me working on big endian systems for a while now. However your data is in little endian IEEE floating point. Here are the values I got after conversion.

0.089630127 -> 0x0090b73d
0.084136963 -> 0x0050ac3d
0.086303711 -> 0x00c0b03d

As you can see, the values are fairly small so you'll probably need to take that into account when applying the volume; the usual convention is to have this data either between 0..1 or -1..1 for min and max volumes respectively.

Here is part of a mixing loop I've written a few years ago, for reference the full mixer is available here

   for(int i = 0; i < a_Sample->count() / a_Sample->channels(); i++){
            float l_Volume = a_Sample->volume() * m_MasterVolume;

            *l_Output++ += *l_Left * l_PanLeft * l_Volume;
            *l_Output++ += *l_Right * l_PanRight * l_Volume;

            l_Left  += a_Sample->channels();
            l_Right += a_Sample->channels();
    }

Notice that for the output you'll probably need to convert the data to signed integers so communicate properly if that's the responsibility of the mixer or the outputting device.

Upvotes: 1

Jason Olson

Reputation: 3696

As others have mentioned you have to know what format the buffer is in. You can't simply just operate on the bytes directly (well, you could, but it would become quite complicated). Most raw PCM data is usually 44100 bits/second, 16 bit, 2 channel. However, that's not always the case. Each one of those can be different. It won't effect it too much, but is an example. However, even WAV files can be in other formats (like IEEE Float). You will need to interpret the buffer correct as the appropriate data type in order to operate on it.

Like:

BYTE buffer[3584];
if (SampleTypeIsPcm16Bit())
{
    short *data = reinterpret_cast<short *>(buffer);
    // Rock on
}
else if (SampleTypeIsFloat())
{
    float *data = reinterpret_cast<float *>(buffer);
    // Rock on
}

Of course, you can make it more generic with templates, but ignore that for know :P.

Keep in mind that if you are dealing with floats, they need to be capped to the range -1.0 and 1.0.

So, are you currently saying the "add two values and divide by two" (mentioned by Jasper) isn't working? How are you playing the data when you just hear silence? I wonder if that's a problem because if your math is off, you would likely hear audio glitches (pops/clicks/etc.) rather than just silence.

Upvotes: 0

Jasper Bekkers

Reputation: 6809

This is probably an array of floats (unlikely due to the byte pattern presented) or singed integers if it's raw PCM data so try using it as such. Mixing to PCM streams is fairly trivial, just add them together and divide them by two (use other weighting for volume control).

Upvotes: 2

John Zwinck

Reputation: 249123

You need to find out if your VoIP system uses compression. It probably does, in which case the first thing you need to do is to decompress the streams, then mix them, then recompress.

Upvotes: 3

How to mix voice audio

Answers (4)

Related Questions