python curve_fit does not give reasonable fitting result

Question

I am trying to fit gaussian to a spectrum and the y values are on the order of 10^(-19). Curve_fit gives me poor fitting result, both before and after I multiply my whole data by 10^(-19). Attached is my code, it is fairly simple set of data except that the values are very small. If I want to keep my original values, how would I get a reasonable gaussian fit that would give me the correct parameters?

#get fits data
aaa=pyfits.getdata('p1.cal.fits')

aaa=np.matrix(aaa)
nrow=np.shape(aaa)[0]
ncol=np.shape(aaa)[1]

ylo=79
yhi=90
xlo=0
xhi=1023
glo=430
ghi=470

#sum all the rows to get spectrum
ysum=[]
for x in range(xlo,xhi):
sum=np.sum(aaa[ylo:yhi,x])
ysum.append(sum)

wavelen_pix=range(xhi-xlo)
max=np.max(ysum)
print "maximum is at x=", np.where(ysum==max)

##fit gaussian
#fit only part of my data in the chosen range [glo:ghi]
x=wavelen_pix[glo:ghi]
y=ysum[glo:ghi]
def func(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/float((2*sigma**2)))

sig=np.std(ysum[500:1000]) #std of background noise

popt, pcov = curve_fit(func, x, sig)
print popt  
#this gives me [1.,1.,1.], which is obviously wrong
gaus=func(x,popt[0],popt[1],popt[2])

aaa is a 153 by 1024 image matrix, partly looks like this:

matrix([[ -8.99793629e-20,   8.57133275e-21,   4.83523386e-20, ...,
-1.54811004e-20,   5.22941515e-20,   1.71179195e-20],
[  2.75769318e-20,   1.03177243e-20,  -3.19634928e-21, ...,
1.66583803e-20,  -9.88712568e-22,  -2.56897725e-20],
[  2.88121935e-20,   8.57964252e-21,  -2.60784327e-20, ...,
1.72335180e-20,  -7.61189937e-21,  -3.45333075e-20],
..., 
[  1.04006903e-20,   1.61200683e-20,   7.04195205e-20, ...,
1.72459645e-20,   4.29404029e-20,   1.99889374e-20],
[  3.22315752e-21,  -5.61394194e-21,   3.28763096e-20, ...,
1.99063583e-20,   2.12989880e-20,  -1.23250648e-21],
[  3.66591810e-20,  -8.08647455e-22,  -6.22773168e-20, ...,
-4.06145681e-21,   4.92453132e-21,   4.23689309e-20]], dtype=float32)

dermen · Accepted Answer

You are calling curve_fit incorrectly, here is the usage

curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, **kw)

f is your function whose first arg is an array of independent variables, and whose subsequent args are the function parameters (such as amplitude, center, etc)
xdata are the independent variables
ydata are the dependedent variable
p0 is an initial guess at the function parameters (for Guassian this is amplitude, width, center)

By default p0 is set to a list of ones [1,1,...], which is probably why you get that as a result, the fit just never executed because you called it incorrectly.

Try estimating the amplitude, center, and width from the data, then make a p0 object (see below for details)

init_guess = ( a_i, x0_i, sig_i) # same order as they are supplied to your function
popt, pcov = curve_fit(func, xdata=x,ydata=y,p0=init_guess)

Here is a short example

xdata = np.linspace(0, 4, 50)
mygauss = ( 10,2,0.5) #( amp, center, width)
y     = func(xdata, *mygauss  ) # using your func defined above    
ydata = y + 2*(np.random.random(50)- 0.5) # add some noise to create fake data

Now I can guess the fit params

ai    = np.max( ydata) # guess the amplitude
xi    = xdata[ np.argmax( ydata)] # guess the position of center

Guessing the width is tricky, I would first find where the half max is located (there are two, but you only need to find one, as the Gaussian is symmetric):

pos_half = argmin( np.abs( ydata-ao/2 ) ) # subtract half the amplitude and find the minimum

Now evaluate how far this is from the center of the gaussian (xi) :

sig_i = np.abs( xi - xdata[ pos_half] ) # estimate the width

Now you can make make the initial guess

init_guess = (ai, xi sig_i)

and fit

params, variance = curve_fit( func, xdata=xdata, ydata=ydata, p0=init_guess)
print params
#array([ 9.99457443,  2.01992858,  0.49599629])

which is very close to mygauss. Hope it helps.

python curve_fit does not give reasonable fitting result

Answers (2)

Related Questions