maelstromscientist
maelstromscientist

Reputation: 551

Errors on a Gaussian histogram curve fit using scipy.curve_fit()

I am trying to get the fit errors of a Gaussian fit of a histogram. I am using scipy.optimize.curve_fit in the following code:

import matplotlib.pylab as plt 
from pylab import exp
import numpy as np 
from scipy import optimize
from math import sqrt

# Fit functions
def Gaussian(x,a,b,c):
    return a * exp(-(x - b)**2.0 / (2 * c**2))

# Generate data from random Guassian distribution
npix = 10200
nbins = int(sqrt(npix))
data = np.random.standard_normal(npix)
print('\n Length of y: %s' % len(data))
n,bins,patches = plt.hist(data,bins=nbins)

# Generate data from bins as a set of points 
bin_size = abs(bins[1]-bins[0])
x =np.linspace(start=bins[0]+bin_size/2.0,stop=bins[-2]+bin_size/2.0,\
num=nbins,endpoint=True)
print min(x),max(x),len(x), np.mean(x)
y = n
y[y==0]= 1e-8


popt, pcov = optimize.curve_fit(Gaussian,x,y) 

# Curve-fit error method
error = [] 
for i in range(len(popt)):
    try:
      error.append( np.absolute(pcov[i][i])**0.5)
    except:
      error.append( 0.00 )
pfit_curvefit = popt
perr_curvefit = np.array(error) 
print('\n Curve-fit Curve fit: %s' % pfit_curvefit)
print('\n Curve-fit Fit errors: %s' % perr_curvefit)

# Plot the fit
x_fit = np.linspace(x[0], x[-1], nbins)
y_gauss = Gaussian(x_fit, *popt)
# y_boot = Gaussian(x_fit, *pfit_bootstrap)
yerr=Gaussian(x_fit,*perr_curvefit)

plt.plot(x_fit, y_gauss,linestyle='--',linewidth=2,\
color='red',label='Gaussian')



plt.xlabel('Pixel Values')
plt.ylabel('Frequency')
plt.title('Npix = %s, Nbins = %s'% (npix,nbins))
plt.legend()
plt.show()

enter image description here

As you can see, I can get Python to adequately fit the histogram data no problem. The problem arises when I try to compute the fit error

yerr=Gaussian(x_fit,*perr_curvefit)

This seems like it would be the right thing to do, however when I look at this list of errors it looks non-sensical:

...
0.0
0.0
0.0
0.0
0.0
2.60905702842e-265
2.27384038589e-155
1.02313435685e-74
2.37684931814e-23
0.285080112094
1.76534048255e-08
5.64399121475e-45
9.31623567809e-111
7.93945868459e-206
0.0
0.0
0.0
0.0
0.0
...

My questions are: 1. Were the errors in the fit computed correctly, and if not, what would be the correct way to compute them. 2. I need the errors in the fit to compute a reduced chi-squared value. Is there another way I can compute chi-squared without having to know the error at each point in the fit?

Thank you in advance!

Upvotes: 0

Views: 3809

Answers (1)

Jim Parker
Jim Parker

Reputation: 1145

Your main error is the calculation of the error values. Your array error is asymptotic estimate of the standard deviations for the coefficients a,b,c in the function Gaussian. You can't then input a value x and uncertainty +/-a, +/-b, +/-c and get meaningful results, because the error is about the mean of a,b & c, i.e., Gaussian(x, a+/- delta a, etc.)

If you are not wedded to the use of optimize.curve_fit() and would use optimize.leastsq(), the information you want is readily available.
See this question

replace

popt, pcov = optimize.curve_fit(Gaussian,x,y)

with

def residual(p, x, y):
    return Gaussian(x, *p) - y

initGuess = [1,1,1] # or whatever you want the search to start at
popt, pcov, infodict, mesg, ier = optimize.least_squares(residual,initGuess, args=[x,y], full_output=True)

and then follow the instructions in the solution to find the reduced chi square

s_sq = (infodict['fvec']**2).sum()/ (N-n)

Upvotes: 1

Related Questions