Reputation: 12679
I am new in DSP, trying to calculate fundamental frequency ( f(0)
) for each segmented frame of the audio file. The methods of F0 estimation can be divided into three categories:
Most of the examples are estimating fundamental frequency based on the frequency structure frequency-domain, I am looking for based on temporal dynamics of the signal time-domain.
This article provides some information but I am still not clear how to calculate it in the time domain?
https://gist.github.com/endolith/255291This is the code, I have found, used so far :
def freq_from_autocorr(sig, fs):
"""
Estimate frequency using autocorrelation
"""
# Calculate autocorrelation and throw away the negative lags
corr = correlate(sig, sig, mode='full')
corr = corr[len(corr)//2:]
# Find the first low point
d = diff(corr)
start = nonzero(d > 0)[0][0]
# Find the next peak after the low point (other than 0 lag). This bit is
# not reliable for long signals, due to the desired peak occurring between
# samples, and other peaks appearing higher.
# Should use a weighting function to de-emphasize the peaks at longer lags.
peak = argmax(corr[start:]) + start
px, py = parabolic(corr, peak)
return fs / px
How to estimate in time domain?
Thanks in advance!
Upvotes: 2
Views: 2844
Reputation: 11407
It is a correct implementation. Not very robust, but certainly working. To verify this, we can generate a signal of known frequency and see what result we're going to get:
import numpy as np
from scipy.io import wavfile
from scipy.signal import correlate, fftconvolve
from scipy.interpolate import interp1d
fs = 44100
frequency = 440
length = 0.01 # in seconds
t = np.linspace(0, length, int(fs * length))
y = np.sin(frequency * 2 * np.pi * t)
def parabolic(f, x):
xv = 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
yv = f[x] - 1/4. * (f[x-1] - f[x+1]) * (xv - x)
return (xv, yv)
def freq_from_autocorr(sig, fs):
"""
Estimate frequency using autocorrelation
"""
corr = correlate(sig, sig, mode='full')
corr = corr[len(corr)//2:]
d = np.diff(corr)
start = np.nonzero(d > 0)[0][0]
peak = np.argmax(corr[start:]) + start
px, py = parabolic(corr, peak)
return fs / px
Running freq_from_autocorr(y, fs)
gets us ~442.014 Hz
, roughly 0.45% error.
We can make it more precise and robust with slightly more coding:
def indexes(y, thres=0.3, min_dist=1, thres_abs=False):
"""Peak detection routine borrowed from
https://bitbucket.org/lucashnegri/peakutils/src/master/peakutils/peak.py
"""
if isinstance(y, np.ndarray) and np.issubdtype(y.dtype, np.unsignedinteger):
raise ValueError("y must be signed")
if not thres_abs:
thres = thres * (np.max(y) - np.min(y)) + np.min(y)
min_dist = int(min_dist)
# compute first order difference
dy = np.diff(y)
# propagate left and right values successively to fill all plateau pixels (0-value)
zeros, = np.where(dy == 0)
# check if the signal is totally flat
if len(zeros) == len(y) - 1:
return np.array([])
if len(zeros):
# compute first order difference of zero indexes
zeros_diff = np.diff(zeros)
# check when zeros are not chained together
zeros_diff_not_one, = np.add(np.where(zeros_diff != 1), 1)
# make an array of the chained zero indexes
zero_plateaus = np.split(zeros, zeros_diff_not_one)
# fix if leftmost value in dy is zero
if zero_plateaus[0][0] == 0:
dy[zero_plateaus[0]] = dy[zero_plateaus[0][-1] + 1]
zero_plateaus.pop(0)
# fix if rightmost value of dy is zero
if len(zero_plateaus) and zero_plateaus[-1][-1] == len(dy) - 1:
dy[zero_plateaus[-1]] = dy[zero_plateaus[-1][0] - 1]
zero_plateaus.pop(-1)
# for each chain of zero indexes
for plateau in zero_plateaus:
median = np.median(plateau)
# set leftmost values to leftmost non zero values
dy[plateau[plateau < median]] = dy[plateau[0] - 1]
# set rightmost and middle values to rightmost non zero values
dy[plateau[plateau >= median]] = dy[plateau[-1] + 1]
# find the peaks by using the first order difference
peaks = np.where(
(np.hstack([dy, 0.0]) < 0.0)
& (np.hstack([0.0, dy]) > 0.0)
& (np.greater(y, thres))
)[0]
# handle multiple peaks, respecting the minimum distance
if peaks.size > 1 and min_dist > 1:
highest = peaks[np.argsort(y[peaks])][::-1]
rem = np.ones(y.size, dtype=bool)
rem[peaks] = False
for peak in highest:
if not rem[peak]:
sl = slice(max(0, peak - min_dist), peak + min_dist + 1)
rem[sl] = True
rem[peak] = False
peaks = np.arange(y.size)[~rem]
return peaks
def freq_from_autocorr_improved(signal, fs):
signal -= np.mean(signal) # Remove DC offset
corr = fftconvolve(signal, signal[::-1], mode='full')
corr = corr[len(corr)//2:]
# Find the first peak on the left
i_peak = indexes(corr, thres=0.8, min_dist=5)[0]
i_interp = parabolic(corr, i_peak)[0]
return fs / i_interp, corr, i_interp
Running freq_from_autocorr_improved(y, fs)
yields ~441.825 Hz
, roughly 0.41% error. This method will perform better for more complex cases and takes up twice longer to compute.
By sampling longer (i.e. setting length
to e.g. 0.1s) we will obtain more accurate results.
Upvotes: 5