Realtime audio mixing algorithm for large audio conference

Question

For an audio conference, I have audio data (short array of audio samples, 16-bit 16kHz audio) for every participant and I want to mix them into a single short array so I can play it on the client end. Mixing is on the client end for SFU architecture.

I have searched and found many answers, many from a long time ago like for 2 samples A and B, doing A+B-A*B (which has unacceptable distortions) and summing all samples then dividing by participant count (which can cause markable volume drop of a participant?), dynamic gain control after summing samples and tracking slope to keep it under control. The main problem lies in the real-time constraints. I tried something like this :

   public synchronized int mix(ArrayList rawData, short [] output, int outOffset){
    if(rawData.size() == 0)
        return 0;
    else if(rawData.size() == 1){
        System.out.println("size 1");
        AudioFrameShort shortFrame = rawData.get(0);
        System.arraycopy(shortFrame.data, 0, output, outOffset, shortFrame.len);
        return shortFrame.len;
    }
    int dataLength = rawData.get(0).len;
    for(int i=1; i 1.0f){
            mixed = 1.0f;
        }
        if (mixed < -1.0f) {
            mixed = -1.0f;
        }
        output[outOffset + j] = (short)(mixed * 32768.0f)
    }
    return dataLength;
}

my question is which is the best way to date and what algorithm does industry leaders like skype, zoom, discord follows to mix audio on the client-side both preventing overflow and keeping smoothness for even large conference. Thanks in advance.

Realtime audio mixing algorithm for large audio conference

Answers (0)

Related Questions