Reputation: 1599
I have some data which I have fitted a normal distribution to using the scipy.stats.normal objects fit function like so:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import matplotlib.mlab as mlab
x = np.random.normal(size=50000)
fig, ax = plt.subplots()
nbins = 75
mu, sigma = norm.fit(x)
n, bins, patches = ax.hist(x,nbins,normed=1,facecolor = 'grey', alpha = 0.5, label='before');
y0 = mlab.normpdf(bins, mu, sigma) # Line of best fit
ax.plot(bins,y0,'k--',linewidth = 2, label='fit before')
ax.set_title('$\mu$={}, $\sigma$={}'.format(mu, sigma))
plt.show()
I would now like to extract the uncertainty/error in the fitted mu and sigma values. How can I go about this?
Upvotes: 11
Views: 12152
Reputation: 1460
You can use scipy.optimize.curve_fit
:
This method does not only return the estimated optimal values of
the parameters, but also the corresponding covariance matrix:
popt : array
Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) - ydata is minimized
pcov : 2d array
The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).
How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as described above.
If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.
You can calculate the standard deviation errors of the parameters from the square roots of the diagonal elements of the covariance matrix as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.optimize import curve_fit
x = np.random.normal(size=50000)
fig, ax = plt.subplots()
nbins = 75
n, bins, patches = ax.hist(x,nbins, density=True, facecolor = 'grey', alpha = 0.5, label='before');
centers = (0.5*(bins[1:]+bins[:-1]))
pars, cov = curve_fit(lambda x, mu, sig : norm.pdf(x, loc=mu, scale=sig), centers, n, p0=[0,1])
ax.plot(centers, norm.pdf(centers,*pars), 'k--',linewidth = 2, label='fit before')
ax.set_title('$\mu={:.4f}\pm{:.4f}$, $\sigma={:.4f}\pm{:.4f}$'.format(pars[0],np.sqrt(cov[0,0]), pars[1], np.sqrt(cov[1,1 ])))
plt.show()
This results in the following plot:
Upvotes: 10
Reputation: 7862
See also lmfit (https://github.com/lmfit/lmfit-py) which gives an easier interface and reports uncertainties in fitted variables. To fit data to a normal distribution, see http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peak-data-to-gaussian-lorentzian-and-voigt-profiles
and use something like
from lmfit.models import GaussianModel
model = GaussianModel()
# create parameters with initial guesses:
params = model.make_params(center=9, amplitude=40, sigma=1)
result = model.fit(ydata, params, x=xdata)
print(result.fit_report())
The report will include the 1-sigma errors like
[[Variables]]
sigma: 1.23218358 +/- 0.007374 (0.60%) (init= 1.0)
center: 9.24277047 +/- 0.007374 (0.08%) (init= 9.0)
amplitude: 30.3135620 +/- 0.157126 (0.52%) (init= 40.0)
fwhm: 2.90157055 +/- 0.017366 (0.60%) == '2.3548200*sigma'
height: 9.81457817 +/- 0.050872 (0.52%) == '0.3989423*amplitude/max(1.e-15, sigma)'
Upvotes: 3