Pavel
Pavel

Reputation: 7562

methods for estimating SNR of an audio file?

how do I estimate SNR from a single audio file containing speech? I know of two methods:

  1. log power histogram pecentile difference (aka "NIST quick method"), described here: http://labrosa.ee.columbia.edu/~dpwe/tmp/nist/doc/stnr.txt

  2. 10*log10( (S-N)/N ), where

    • S = sum{x[i]^2 * e[i]}
    • N = sum{x[i]^2 * (1-e[i])}
    • e[i] some sort of voice activity detection (speech/non-speech indicator)

are there any better methods that do not require stereo data (or data in both clean and noisy version)? I also would like to avoid the "second method" described in the NIST document (see 1.) that makes strong assumptions about the distributions.

Upvotes: 3

Views: 7039

Answers (1)

heyo
heyo

Reputation: 79

Human voice uses frequencies from 300 Hz to 3 kHz. This is what (old) telephone systems are using. Human voice never uses all these frequencies at a time, this is why we can do a frequency analysis for finding the noise floor - without any reference or voice activity detection e[i]:

  1. Compute FFT with a frequency resolution of ~ 10 - 20 Hz. With a samplerate of 48 kHz you would use an FFT length of samplerate/resolution = 4800 samples, which should the get rounded to the nearest power of 2, which is 4096

  2. Identify the necessary bins which hold the results from 300 - 3000 Hz. The bin index k holds the result for frequency k*samplerate/FFT_length. For above 48 kHz input and FFT length 4096 this is k(300 Hz) = 300 * 4096 / 48000 ~= 25 and k(3000 Hz) = 3000 * 4096 / 48000 ~= 250.

  3. Calculate the energy in each necessary bin: E[k] = FFT[k].re ^2 + FFT[k].im ^2. It depends on your FFT algorithm "where" the real and imaginary parts are written.

  4. N = min{ E[k=25..250] } * number_of_bins (=250-25+1)

  5. S = sum{ E[k=25..250] }

  6. SNR = (S-N)/N. The level is 10*log10(SNR)

  7. As the SNR varies over time, go back to step 1 with some new samples - probably with some overlap

Upvotes: 8

Related Questions