Reputation: 7562
how do I estimate SNR from a single audio file containing speech? I know of two methods:
log power histogram pecentile difference (aka "NIST quick method"), described here: http://labrosa.ee.columbia.edu/~dpwe/tmp/nist/doc/stnr.txt
10*log10( (S-N)/N ), where
are there any better methods that do not require stereo data (or data in both clean and noisy version)? I also would like to avoid the "second method" described in the NIST document (see 1.) that makes strong assumptions about the distributions.
Upvotes: 3
Views: 7039
Reputation: 79
Human voice uses frequencies from 300 Hz to 3 kHz. This is what (old) telephone systems are using. Human voice never uses all these frequencies at a time, this is why we can do a frequency analysis for finding the noise floor - without any reference or voice activity detection e[i]:
Compute FFT with a frequency resolution of ~ 10 - 20 Hz. With a samplerate of 48 kHz you would use an FFT length of samplerate/resolution = 4800 samples, which should the get rounded to the nearest power of 2, which is 4096
Identify the necessary bins which hold the results from 300 - 3000 Hz. The bin index k holds the result for frequency k*samplerate/FFT_length. For above 48 kHz input and FFT length 4096 this is k(300 Hz) = 300 * 4096 / 48000 ~= 25 and k(3000 Hz) = 3000 * 4096 / 48000 ~= 250.
Calculate the energy in each necessary bin: E[k] = FFT[k].re ^2 + FFT[k].im ^2. It depends on your FFT algorithm "where" the real and imaginary parts are written.
N = min{ E[k=25..250] } * number_of_bins (=250-25+1)
S = sum{ E[k=25..250] }
SNR = (S-N)/N. The level is 10*log10(SNR)
As the SNR varies over time, go back to step 1 with some new samples - probably with some overlap
Upvotes: 8