Python - curve fit producing incorrect fit

Question

I'm trying to fit a sine wave curve this data distribution, but for some reason, the fit is incorrect:

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from scipy.optimize import curve_fit





#=======================
#====== Analysis =======
#=======================

# sine curve fit
def fit_Sin(t, A, b, C):
    return A* np.sin(t*b) + C

## The Data extraciton
t,y,y1 = np.loadtxt("new10_CoCore_5to20_BL.txt", unpack=True)

xdata = t
popt, pcov = curve_fit(fit_Sin, t, y)
print "A = %s , b = %s, C = %s" % (popt[0], popt[1], popt[2])



#=======================
#====== Plotting =======
#=======================

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

ax1.plot(t, y, ".")
ax1.plot(t, fit_Sin(t, *popt))


plt.show()

In which this fit makes an extreme underestimation of the data. Any ideas why that is?

Here is the data provided here: https://www.dropbox.com/sh/72jnpkkk0jf3sjg/AAAb17JSPbqhQOWnI68xK7sMa?dl=0

Any idea why this is producing this?

Mad Physicist · Accepted Answer

Sine waves are extremely difficult to fit if your frequency guess is off. That is because with a sufficient number of cycles in the data, the guess will be out of phase with half the data and in phase with half of it for even a small error in the frequency. At that point, a straight line offers a better fit than a sine wave of different frequency. That is how Fourier transforms work by the way.

I can think of three ways to estimate the frequency well enough to allow a non linear least squares algorithm to take over:

Eyeball it. Subtract the x-values of two peaks in the GUI or even in the command line. If you have very low noise data, you can automate this process quite easily.
Use a Discrete Fourier transform. If your data is a sine wave of one component, the first non-constant peak will give you the frequency. I have found this to require some additional tweaking since the frequency of the sampling is often not a multiple of the sine wave frequency. A parabolic fit to the three points around the peak (three including the peak) can help in this situation.
Find where your data crosses the vertical offset. This is similar to #1 but is easier to automate for relatively non-noisy data. The wavelength is twice the distance between a pair of intersections.

Using #1, I can clearly see that your wavelength is 50. The initial guess for b should therefore be 2*np.pi/50. Also, don't forget to add a phase shift parameter to allow the fit to slide horizontally: A*sin(b*t + d) + C.

You will need to pass in an initial guess via the p0 parameter to curve_fit. A good eyeball estimate is p0=(0.55, np.pi/25, 0.0, -np.pi/25*12.5). The phase shift in your data appears to be a quarter period to the right, hence the 12.5.

I am currently in the process of writing an algorithm for fitting noisy sine waves with a single frequency component that I will submit to SciPy. Will update when I finish.

Python - curve fit producing incorrect fit

Answers (1)

Related Questions