Why does maximum likelihood parameters estimation for scipy.stats distributions perform so poor sometimes?

Question

I have a set of experimental values and I want to find the function that describes their distribution better. But in the process of tinkering with some functions, I discovered that scipy.optimize.curve_fit and scipy.stats.rv_continuous.fit give very different results, usually not in favor for the latter. Here is a simple example:

#!/usr/bin/env python3
import numpy as np
from scipy.optimize import curve_fit as fit
from scipy.stats import gumbel_r, norm
import matplotlib.pyplot as plt

amps = np.loadtxt("pyr_11.txt")*-1000 # http://pastebin.com/raw.php?i=uPK31JGE
argsGumbel0 = gumbel_r.fit(amps)
argsGauss0 = norm.fit(amps)
bins = np.arange(60)
probs, binedges = np.histogram(amps, bins=bins, normed=True)
bincenters = 0.5*(binedges[1:]+binedges[:-1])
argsGumbel1 = fit(gumbel_r.pdf, bincenters, probs, p0=argsGumbel0)[0]
argsGauss1 = fit(norm.pdf, bincenters, probs, p0=argsGauss0)[0]

plt.figure()
plt.hist(amps, bins=bins, normed=True, color='0.5')
xes = np.arange(0, 60, 0.1)
plt.plot(xes, gumbel_r.pdf(xes, *argsGumbel0), linewidth=2, label='Gumbel, maximum likelihood')
plt.plot(xes, gumbel_r.pdf(xes, *argsGumbel1), linewidth=2, label='Gumbel, least squares')
plt.plot(xes, norm.pdf(xes, *argsGauss0), linewidth=2, label='Gauss, maximum likelihood')
plt.plot(xes, norm.pdf(xes, *argsGauss1), linewidth=2, label='Gauss, least squares')
plt.legend(loc='upper right')
plt.show()

enter image description here

The difference in performance varies from dramatic to mild, but in my case it is always present. Why is that so? How do I choose the most appropriate optimisation method for the case?

Why does maximum likelihood parameters estimation for scipy.stats distributions perform so poor sometimes?

Answers (1)

Related Questions