iron2man
iron2man

Reputation: 1827

Python - curve fit producing incorrect fit

I'm trying to fit a sine wave curve this data distribution, but for some reason, the fit is incorrect:

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from scipy.optimize import curve_fit





#=======================
#====== Analysis =======
#=======================

# sine curve fit
def fit_Sin(t, A, b, C):
    return A* np.sin(t*b) + C

## The Data extraciton
t,y,y1 = np.loadtxt("new10_CoCore_5to20_BL.txt", unpack=True)

xdata = t
popt, pcov = curve_fit(fit_Sin, t, y)
print "A = %s , b = %s, C = %s" % (popt[0], popt[1], popt[2])



#=======================
#====== Plotting =======
#=======================

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

ax1.plot(t, y, ".")
ax1.plot(t, fit_Sin(t, *popt))


plt.show()

enter image description here

In which this fit makes an extreme underestimation of the data. Any ideas why that is?

Here is the data provided here: https://www.dropbox.com/sh/72jnpkkk0jf3sjg/AAAb17JSPbqhQOWnI68xK7sMa?dl=0

Any idea why this is producing this?

Upvotes: 3

Views: 1274

Answers (1)

Mad Physicist
Mad Physicist

Reputation: 114310

Sine waves are extremely difficult to fit if your frequency guess is off. That is because with a sufficient number of cycles in the data, the guess will be out of phase with half the data and in phase with half of it for even a small error in the frequency. At that point, a straight line offers a better fit than a sine wave of different frequency. That is how Fourier transforms work by the way.

I can think of three ways to estimate the frequency well enough to allow a non linear least squares algorithm to take over:

  1. Eyeball it. Subtract the x-values of two peaks in the GUI or even in the command line. If you have very low noise data, you can automate this process quite easily.
  2. Use a Discrete Fourier transform. If your data is a sine wave of one component, the first non-constant peak will give you the frequency. I have found this to require some additional tweaking since the frequency of the sampling is often not a multiple of the sine wave frequency. A parabolic fit to the three points around the peak (three including the peak) can help in this situation.
  3. Find where your data crosses the vertical offset. This is similar to #1 but is easier to automate for relatively non-noisy data. The wavelength is twice the distance between a pair of intersections.

Using #1, I can clearly see that your wavelength is 50. The initial guess for b should therefore be 2*np.pi/50. Also, don't forget to add a phase shift parameter to allow the fit to slide horizontally: A*sin(b*t + d) + C.

You will need to pass in an initial guess via the p0 parameter to curve_fit. A good eyeball estimate is p0=(0.55, np.pi/25, 0.0, -np.pi/25*12.5). The phase shift in your data appears to be a quarter period to the right, hence the 12.5.

I am currently in the process of writing an algorithm for fitting noisy sine waves with a single frequency component that I will submit to SciPy. Will update when I finish.

Upvotes: 3

Related Questions