Reputation: 1756
I am trying to figure out what it is I don't understand here.
I am following http://www.scipy.org/Cookbook/FittingData and trying to fit a sine wave. The real problem is satellite magnetometer data which makes a nice sine wave on a spinning spacecraft. I created a dataset then am trying to fit it to recover the inputs.
Here is my code:
import numpy as np
from scipy import optimize
from scipy.optimize import curve_fit, leastsq
import matplotlib.pyplot as plt
class Parameter:
def __init__(self, value):
self.value = value
def set(self, value):
self.value = value
def __call__(self):
return self.value
def fit(function, parameters, y, x = None):
def f(params):
i = 0
for p in parameters:
p.set(params[i])
i += 1
return y - function(x)
if x is None: x = np.arange(y.shape[0])
p = [param() for param in parameters]
return optimize.leastsq(f, p, full_output=True, ftol=1e-6, xtol=1e-6)
# generate a perfect data set (my real data have tiny error)
def mysine(x, a1, a2, a3):
return a1 * np.sin(a2 * x + a3)
xReal = np.arange(500)/10.
a1 = 200.
a2 = 2*np.pi/10.5 # omega, 10.5 is the period
a3 = np.deg2rad(10.) # 10 degree phase offset
yReal = mysine(xReal, a1, a2, a3)
# plot the real data
plt.figure(figsize=(15,5))
plt.plot(xReal, yReal, 'r', label='Real Values')
# giving initial parameters
amplitude = Parameter(175.)
frequency = Parameter(2*np.pi/8.)
phase = Parameter(0.0)
# define your function:
def f(x): return amplitude() * np.sin(frequency() * x + phase())
# fit! (given that data is an array with the data to fit)
out = fit(f, [amplitude, frequency, phase], yReal, xReal)
period = 2*np.pi/frequency()
print amplitude(), period, np.rad2deg(phase())
xx = np.linspace(0, np.max(xReal), 50)
plt.plot( xx, f(xx) , label='fit')
plt.legend(shadow=True, fancybox=True)
Which makes this plot:
The recovered fit parameters of [44.2434221897 8.094832581 -61.6204033699]
have no resemblance to what I started with.
Any thoughts on what I am not understanding or doing wrong?
scipy.__version__
'0.10.1'
Edit:
Fixing one parameter was suggested. In the example above fixing the amplitude to np.histogram(yReal)[1][-1]
still produces unacceptable output. Fits: [175.0 8.31681375217 6.0]
Should I try a different fitting method? Suggestions on which?
Upvotes: 5
Views: 9012
Reputation: 61
There is a python library called pyestimate which implements sinusoid fit. It can fit a noisy sine wave:
#!pip install pyestimate
from pyestimate import sin_param_estimate
import numpy as np
import matplotlib.pyplot as plt
# define a signal to be fitted
n = np.arange(100)
s = 1.234 * np.cos(2*np.pi*0.0345*n + np.pi/7)
# add some noise
x = s + np.random.normal(scale=0.5, size=len(n))
# fit amplitude, frequency and phase
A,f,phi = sin_param_estimate(x)
# plot result
plt.plot(s, '-', label='original sine')
plt.plot(x, '.', label='noisy input data')
plt.plot(A*np.cos(2*np.pi*f*n+phi), 'r--', label='fitted sine')
plt.legend()
pyestimate also implements fitting of a sum of sine waves.
Upvotes: 1
Reputation: 880239
Here is some code implementing some of Zhenya's ideas. It uses
yhat = fftpack.rfft(yReal)
idx = (yhat**2).argmax()
freqs = fftpack.rfftfreq(N, d = (xReal[1]-xReal[0])/(2*pi))
frequency = freqs[idx]
to guess the main frequency of the data, and
amplitude = yReal.max()
to guess the amplitude.
import numpy as np
import scipy.optimize as optimize
import scipy.fftpack as fftpack
import matplotlib.pyplot as plt
pi = np.pi
plt.figure(figsize = (15, 5))
# generate a perfect data set (my real data have tiny error)
def mysine(x, a1, a2, a3):
return a1 * np.sin(a2 * x + a3)
N = 5000
xmax = 10
xReal = np.linspace(0, xmax, N)
a1 = 200.
a2 = 2*pi/10.5 # omega, 10.5 is the period
a3 = np.deg2rad(10.) # 10 degree phase offset
print(a1, a2, a3)
yReal = mysine(xReal, a1, a2, a3) + 0.2*np.random.normal(size=len(xReal))
yhat = fftpack.rfft(yReal)
idx = (yhat**2).argmax()
freqs = fftpack.rfftfreq(N, d = (xReal[1]-xReal[0])/(2*pi))
frequency = freqs[idx]
amplitude = yReal.max()
guess = [amplitude, frequency, 0.]
print(guess)
(amplitude, frequency, phase), pcov = optimize.curve_fit(
mysine, xReal, yReal, guess)
period = 2*pi/frequency
print(amplitude, frequency, phase)
xx = xReal
yy = mysine(xx, amplitude, frequency, phase)
# plot the real data
plt.plot(xReal, yReal, 'r', label = 'Real Values')
plt.plot(xx, yy , label = 'fit')
plt.legend(shadow = True, fancybox = True)
plt.show()
yields
(200.0, 0.5983986006837702, 0.17453292519943295) # (a1, a2, a3)
[199.61981404516041, 0.61575216010359946, 0.0] # guess
(200.06145097308041, 0.59841420869261097, 0.17487141943703263) # fitted parameters
Notice that by using fft, the guess for the frequency is already pretty close to final fitted parameter.
It seems you do not need to fix any of the parameters.
By making the frequency guess closer to the actual value, optimize.curve_fit
is able to converge to a reasonable answer.
Upvotes: 2
Reputation: 26070
From what I can see from playing a bit with leastsq
(without fancy stuff from the cookbook, just plain direct calls to leastsq
--- and by the way, full_output=True
is your friend here), is that it's very hard to fit all three of the amplitude, frequency and phase in one go. On the other hand, if I fix the amplitude and fit the frequency and phase, it works; if I fix the frequency and fit the amplitude and phase, it works too.
There is more than one way out here. What might be the simplest one --- if you are sure you only have one sine wave (and this is easy to check with the Fourier transform), then you know the frequency from just the distance between consecutive maxima of your signal. Then fit the two remaining parameters.
If what you have is a mixture of several harmonics, well, again, Fourier transform will tell you that.
Upvotes: 2