Kantura
Kantura

Reputation: 6523

Fitting gaussian to a curve in Python II

I have two lists .

import numpy
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])

They make a U shaped curve. Now I want to fit a gaussian to that curve.

from scipy.optimize import curve_fit
n = len(x)
mean = sum(y)/n
sigma = sum(y - mean)**2/n

def gaus(x,a,x0,sigma,c):
    return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c

popt, pcov = curve_fit(gaus,x,y,p0=[-1,mean,sigma,-5])

pylab.plot(x,y,'r-')
pylab.plot(x,gaus(x,*popt),'k-')
pylab.show()

I just end up with the noisy original U-shaped curve and a straight horizontal line running through the curve.

I am not sure what the -1 and the -5 represent in the above code but I am sure that I need to adjust them or something else to get the gaussian curve. I have been playing around with possible values but to no avail.

Any ideas?

Upvotes: 0

Views: 3302

Answers (5)

MiB_Coder
MiB_Coder

Reputation: 915

Maybe it is because I use matlab and fminsearch or my fits have to work on much fewer datapoints (~ 5-10), I have much better results with the following starter values (as simple as they are):

a = max(y)-min(y);
imax= find(y==max(y),1);
mean = x(imax);
avg = sum(x.*y)./sum(y);
sigma = sqrt(abs(sum((x-avg).^2.*y) ./ sum(y)));
c = min(y);

The sigma works fine.

Upvotes: 0

Tim Rausch
Tim Rausch

Reputation: 21

I don't think you are estimating your initial guesses for mean and sigma correctly.

Take a look at the SciPy Cookbook here

I think it should look like this.

x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])

n = len(x)
mean = sum(x*y)/sum(y)
sigma = sqrt(abs(sum((x-mean)**2*y)/sum(y)))

def gaus(x,a,x0,sigma,c):
   return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c

popy, pcov = curve_fit(gaus,x,y,p0=[-max(y),mean,sigma,min(x)+((max(x)-min(x)))/2])

pylab.plot(x,gaus(x,*popt))

If anyone has a link to a simple explanation why these are the correct moments I would appreciate it. I am going on faith that SciPy Cookbook got it right.

Upvotes: 1

Kantura
Kantura

Reputation: 6523

Here is the solution thanks to everyone .

x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])

n = len(x)
mean = sum(x)/n
sigma = math.sqrt(sum((x-mean)**2)/n)

def gaus(x,a,x0,sigma,c):
    return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c

popy, pcov = curve_fit(gaus,x,y,p0=[-max(y),mean,sigma,min(x)+((max(x)-min(x)))/2])

pylab.plot(x,gaus(x,*popt))

Upvotes: 0

ev-br
ev-br

Reputation: 26110

First of all, your variable sigma is actually variance, i.e. sigma squared --- http://en.wikipedia.org/wiki/Variance#Definition. This confuses the curve_fit by giving it a suboptimal starting estimate.

Then, your fitting ansatz, gaus, includes an amplitude a and an offset, is this what you actually need? And the starting values are a=-1 (negated bell shape) and offset c=-5. Where do they come from?

Here's what I'd do:

  • fix your fitting model. Do you want just a gaussian, does it need to be normalized. If it does, then the amplitude a is fixed by sigma etc.
  • Have a look at the actual data. What's the tail (offset), what's the sign (amplitude sign).

If you're actually want just a gaussian without any bells and whistles, you might not actually need curve_fit: a gaussian is fully defined by two first moments, mean and sigma. Calculate them as you do, plot them over the data and see if you're not all set.

Upvotes: 2

cgeroux
cgeroux

Reputation: 192

p0 in your call to curve_fit gives the initial guesses for the additional parameters of you function in addition to x. In the above code you are saying that I want the curve_fit function to use -1 as the initial guess for a, -5 as the initial guess for c, mean as the initial guess for x0, and sigma as the guess for sigma. The curve_fit function will then adjust these parameters to try and get a better fit. The problem is your initial guesses at your function parameters are really bad given the order of (x,y)s.

Think a little bit about the order of magnitude of your different parameters for the Gaussian. a should be around the size of your y values (10**-16) as at the peak of the Gaussian the exponential part will never be larger than 1. x0 will give the position within your x values at which the exponential part of your Gaussian will be 1, so x0 should be around 7500, probably somewhere in the centre of your data. Sigma indicates the width, or spread of your Gaussian, so perhaps something in the 100's just a guess. Finally c is just an offset to shift the whole Gaussian up and down.

What I would recommend doing, is before fitting the curve, pick some values for a, x0, sigma, and c that seem reasonable and just plot the data with the Gaussian, and play with a, x0, sigma, and c until you get something that looks at least some what the way you want the Gaussian to fit, then use those as the starting points for curve_fit p0 values. The values I gave should get you started, but may not do exactly what you want. For instance a probably needs to be negative if you want to flip the Gaussian to get a "U" shape.

Also printing out the values that curve_fit thinks are good for your a,x0,sigma, and c might help you see what it is doing and if that function is on the right track to minimizing the residual of the fit.

I have had similar problems doing curve fitting with gnuplot, if the initial values are too far from what you want to fit it goes in completely the wrong direction with the parameters to minimize the residuals, and you could probably do better by eye. Think of these functions as a way to fine tune your by eye estimates of these parameters.

hope that helps

Upvotes: 1

Related Questions