user103952
user103952

Reputation:

Normalizing FFT Data for Human Hearing

The typical FFT for audio looks pretty similar to this, with most of the action happening on the far left side

http://www.flight404.com/blog/images/fft.jpg

He multiplied it by a partial sine wave to get it to the bottom, but the article isn't too specific on this part of it. It also seems like a "good enough" modification of the dataset, rather than one based on some property. I understand that human hearing is better suited to the higher frequencies, thus, most music will have amplified bass and attenuated treble so that both sound to us as being of relatively equal strength.

My question is what modification needs to be done to the FFT to compensate for this standard falloff?

for(i = 0; i < fft.length; i++){
     fft[i] = fft[i] * Math.log(i + 1); // does, eh, ok but the high
                                        // end is still not really "loud"
                                        // enough
}

EDIT ::

http://en.wikipedia.org/wiki/Equal-loudness_contour

I came across this article, I think it might be the direction to head in, but there still might be some property of an FFT that needs to be counteracte.

Upvotes: 6

Views: 2819

Answers (4)

tom10
tom10

Reputation: 69242

First, are you sure you want to do this? It makes sense to compensate for some things, like the microphone response not being flat, but not human perception. People are used to hearing sounds with the spectral content that the sounds have in the real world, not along perceptual equal loudness curves. If you play a sound that you've modified in the way you suggest it would sound strange. Maybe some people like the music to have enhanced low frequencies, but this is a matter of taste, not psychophysics.

Or maybe you are compensating for some other reason, for example, taking into account the poorer sensitivity to lower frequencies might enhance a compression algorithm. Is this the idea?

If you do want to normalize by the equal loudness curves, one should note that most of the curves and equations are in terms of sound pressure level (SPL). SPL is the log of the square of the waveform amplitude, so when you work with the FFTs, it's probably easiest to work with their square (the power specta). (Or, of course, you could compensate in other ways by, say, multiplying by sqrt(log(i+1)) in your equation above -- assuming that the log was an approximation of the inverse equal-loudness curve.)

Upvotes: 3

Ludwig Weinzierl
Ludwig Weinzierl

Reputation: 16624

I think the equal loudness contour is exactly the right direction. However, its shape depends on the absolute pressure level. In other words the sensitivity curve of our hearing changes with sound pressure.

There is no "correct normalization" if you have no information about absolute levels. If this is a problem depends on what you want to do with the data.

The loudness contour is standardized in ISO 226 but this document is not freely available for download. It should be in a decent university library though. Here is another source for loudness contours

Upvotes: 3

rama-jka toti
rama-jka toti

Reputation: 1436

In the old days of first samplers, this is before MOTU Boost people :) it wasn't FFT but simple (Fairlight or Roland it first I think) Normalisation done on the original or resulting time-domain signal (if you are doing beat slicing, recycle-style); can't you do that? Or only go for the FFT after you compensate to counteract for it?

Seems like a two phase procedure otherwise, I'd personally leave FFT as is for the task..

Upvotes: 0

Rob Elsner
Rob Elsner

Reputation: 821

So you are trying to raise the level of the high end frequencies? Sounds like a high pass filter with a minimum multiplier might work, so that you don't attenuate the low frequency signals too much. Pick up a good book on filter design, maybe monkey around with this applet

Upvotes: 1

Related Questions