Reputation: 474
I am trying to add support for conference chat in an already up and running single mic chat application (in which only one person can talk at a time). The streaming for both clients and everything is done and the voice is recording and playing well on both the computers that are using the mic but when a third person receives the packets then the audio is in a really weird way, I searched around and found out that I need to mix the two streams and then play them as one. I tried a few algorithms I found on the internet but I am not getting the result I need.
I am using speex as the encoder/decoder after decoding the incoming stream on the client side I tried mixing the two byte arrays/streams through the following algorithms.
Var Buffer1, Buffer2, MixedBuf: TIdBytes;
Begin
For I := 0 To Length(Buffer1) - 1 Do Begin
If Length(Buffer2) >= I Then
MixedBuf[I] := (Buffer1[I] + Buffer2[I]) / 2
Else
MixedBuf[I] := Buffer1[I];
End;
End;
The received buffer are either 492 or 462 bytes so I check if the Buffer2 is smaller than the Buffer1 then mix the first 462 bytes and leave the rest of the bytes unaltered and just add them to MixedBuff.
This algorithm when used have a lot of noise and distortion and only part of the voice can be heard.
Another algorithm which I found on here on stackoverflow submitted by Mark Heath
is to first convert the bytes to floating point values.
Var Buffer1, Buffer2, MixedBuf: TIdBytes;
samplef1, samplef2, Mixed: Extended;
Begin
For I := 0 To Length(Buffer1) - 1 Do Begin
If Length(Buffer2) >= I Then Begin
samplef1 := Buffer1[I] / 65535;
samplef2 := Buffer2[I] / 65535;
Mixed := samplef1 + samplef2;
if (Mixed > 1.0) Then Mixed := 1.0;
if (Mixed < -1.0) Then Mixed := -1.0;
MixedBuf[I] := Round(Mixed * 65535);
End Else
MixedBuf[I] := Buffer1[I];
End;
End;
The value never goes below 0 but still I left the check for if the value goes below -1.0 as it was in the algorithm. This method works a lot better but still there is noise and distortion and the voice from the second stream is always really faint while the voice from the first stream is loud as its supposed to be, even if the first person is not talking the second voice is faint.
P.S: Oh and some details about the audio stream:
The details of the tWAVEFORMATEX record for the audio recording playback are as follows:
FWaveFormat.wFormatTag := WAVE_FORMAT_PCM;
FWaveFormat.nChannels := 1;
FWaveFormat.nSamplesPerSec := WAVESAMPLERATE; // i.e WAVESAMPLERATE = 16000
FWaveFormat.nAvgBytesPerSec := WAVESAMPLERATE*2;
FWaveFormat.nBlockAlign := 2;
FWaveFormat.wBitsPerSample := 16;
FWaveFormat.cbSize := SizeOf(tWAVEFORMATEX);
I hope I am providing all the information needed.
Upvotes: 1
Views: 1036
Reputation: 613612
FWaveFormat.wBitsPerSample := 16;
You need to respect the fact that your samples are 16 bits wide. Your code operates on 8 bits at a time. You could write it something like this:
function MixAudioStreams(const strm1, strm2: TBytes): TBytes;
// assumes 16 bit samples, single channel, common sample rate
var
i: Integer;
n1, n2, nRes: Integer;
p1, p2, pRes: PSmallInt;
samp1, samp2: Integer;
begin
Assert(Length(strm1) mod 2 = 0);
Assert(Length(strm2) mod 2 = 0);
n1 := Length(strm1) div 2;
n2 := Length(strm2) div 2;
nRes := Max(n1, n2);
SetLength(Result, nRes*2);
p1 := PSmallInt(strm1);
p2 := PSmallInt(strm2);
pRes := PSmallInt(Result);
for i := 0 to nRes-1 do begin
if i < n1 then begin
samp1 := p1^;
inc(p1);
end else begin
samp1 := 0;
end;
if i < n2 then begin
samp2 := p2^;
inc(p2);
end else begin
samp2 := 0;
end;
pRes^ := EnsureRange(
(samp1+samp2) div 2,
low(pRes^),
high(pRes^)
);
inc(pRes);
end;
end;
Some people recommend scaling by sqrt(2)
to maintain the combined power of the two signals. That would look like this:
pRes^ := EnsureRange(
Round((samp1+samp2) / Sqrt(2.0)),
low(pRes^),
high(pRes^)
);
Upvotes: 4