Reputation: 434
For an audio conference, I have audio data (short array of audio samples, 16-bit 16kHz audio) for every participant and I want to mix them into a single short array so I can play it on the client end. Mixing is on the client end for SFU architecture.
I have searched and found many answers, many from a long time ago like for 2 samples A and B, doing A+B-A*B (which has unacceptable distortions) and summing all samples then dividing by participant count (which can cause markable volume drop of a participant?), dynamic gain control after summing samples and tracking slope to keep it under control. The main problem lies in the real-time constraints. I tried something like this :
public synchronized int mix(ArrayList<AudioFrameShort> rawData, short [] output, int outOffset){
if(rawData.size() == 0)
return 0;
else if(rawData.size() == 1){
System.out.println("size 1");
AudioFrameShort shortFrame = rawData.get(0);
System.arraycopy(shortFrame.data, 0, output, outOffset, shortFrame.len);
return shortFrame.len;
}
int dataLength = rawData.get(0).len;
for(int i=1; i<rawData.size(); i++)
if(rawData.get(i).len < dataLength)
dataLength = rawData.get(i).len;
for (int j = 0; j < dataLength; j++){
double mixed = 0;
for (int k = 0; k < rawData.size(); k++){
double gain = 1;//rawData.get(k).gainControl.getCurrentGain();
mixed += (gain * rawData.get(k).data[j] / 32768.0f);
}
if (mixed > 1.0f){
mixed = 1.0f;
}
if (mixed < -1.0f) {
mixed = -1.0f;
}
output[outOffset + j] = (short)(mixed * 32768.0f)
}
return dataLength;
}
my question is which is the best way to date and what algorithm does industry leaders like skype, zoom, discord follows to mix audio on the client-side both preventing overflow and keeping smoothness for even large conference. Thanks in advance.
Upvotes: 2
Views: 177