Reputation: 441
I'm currently trying to reproduce the getSpectrum
function of the FMOD audio library. This function read the PCM data of the currently playing buffer, apply a window on this data and apply a FFT to get the spectrum.
It returns an array of float where each float is between 0 and 1 dB (10.0f * ( float)log10(val) * 2.0f
).
I'm not sure of what I do is what I should do so I'll explain it :
First, I get the PCM data in a 4096 bytes buffer, according to the documentation, PCM data is composed of samples which are a left-right pair of data.
In my case I'm working with 16bit samples like in the image above. So, if I want to work only with the left channel, I save the left PCM data in a short array doing :
short *data = malloc(4096);
FMOD_Sound_ReadData(sound, (void *)data, 4096, &read);
So if a sample = 4 bytes, I have 1024 samples i.e 1024 shorts representing the left channel and 1024 shorts representing the right channel.
In order to perform the FFT, I need to have a float array and apply a window (Hanning) on my data:
float hanningWindow(short in, size_t i, size_t s)
{
return in*0.5f*(1.0f-cos(2.0f*M_PI*(float)(i)/(float)(s-1.0f)));
}
whew in
is the input, i
is the position in the array and s
the size of the array (1024).
To get only the left channel :
float *input = malloc(1024*sizeof(float));
for (i = 0; i < 1024; i++)
input[i] = hanningWindow(data[i*2], i, 1024);
Then I perform the FFT thanks to kiss_fft (from real to complex). I get a kiss_fft_cpx *ouput
(array of complex) of size 1024/2+1 = 513.
I calculate the amplitude of each frequency with :
kiss_fft_cpx c = output[i];
float amp = sqrt(c.r*c.r + c.i*c.i);
calculate in dB :
amp = 10.0f * (float)log10(amp) * 2.0f;
amp
is not between 0 and 1. I don't know where I have to normalize my data (on the PCM data or at the end). Also I'm not sure of the way I am applying my window on the PCM data.
Here is the result I get from a 0 to 20kHz song compared to the result of the getSpectrum function. (for a rectangular window)
My Result getSpectrum Result
How can I achieve the same result?
Upvotes: 2
Views: 5024
Reputation: 212979
You're a little confused about log (dB) scales - you don't get a range of 0 - 1 dB, you get a range of typically 96 dB for 16 bit audio, where the upper and lower end are somewhat arbitrary, e.g. 0 to -96 dB, or 96 dB to 0 dB, or any other range you like, depending on various factors. You probably just need to shift and scale your spectrogram plotting by a suitable offset and factor to account for this.
(Note: the range of 96 dB comes from the formula 20 * log10(2^16)
, where 16 is the number of bits.)
Upvotes: 2