Axon
Axon

Reputation: 557

Why does maximum likelihood parameters estimation for scipy.stats distributions perform so poor sometimes?

I have a set of experimental values and I want to find the function that describes their distribution better. But in the process of tinkering with some functions, I discovered that scipy.optimize.curve_fit and scipy.stats.rv_continuous.fit give very different results, usually not in favor for the latter. Here is a simple example:

#!/usr/bin/env python3
import numpy as np
from scipy.optimize import curve_fit as fit
from scipy.stats import gumbel_r, norm
import matplotlib.pyplot as plt

amps = np.loadtxt("pyr_11.txt")*-1000 # http://pastebin.com/raw.php?i=uPK31JGE
argsGumbel0 = gumbel_r.fit(amps)
argsGauss0 = norm.fit(amps)
bins = np.arange(60)
probs, binedges = np.histogram(amps, bins=bins, normed=True)
bincenters = 0.5*(binedges[1:]+binedges[:-1])
argsGumbel1 = fit(gumbel_r.pdf, bincenters, probs, p0=argsGumbel0)[0]
argsGauss1 = fit(norm.pdf, bincenters, probs, p0=argsGauss0)[0]

plt.figure()
plt.hist(amps, bins=bins, normed=True, color='0.5')
xes = np.arange(0, 60, 0.1)
plt.plot(xes, gumbel_r.pdf(xes, *argsGumbel0), linewidth=2, label='Gumbel, maximum likelihood')
plt.plot(xes, gumbel_r.pdf(xes, *argsGumbel1), linewidth=2, label='Gumbel, least squares')
plt.plot(xes, norm.pdf(xes, *argsGauss0), linewidth=2, label='Gauss, maximum likelihood')
plt.plot(xes, norm.pdf(xes, *argsGauss1), linewidth=2, label='Gauss, least squares')
plt.legend(loc='upper right')
plt.show()

enter image description here

The difference in performance varies from dramatic to mild, but in my case it is always present. Why is that so? How do I choose the most appropriate optimisation method for the case?

Upvotes: 2

Views: 1952

Answers (1)

user3371637
user3371637

Reputation:

Don't take this entirely as an answer, because I don't have reputation enough for comment. The fault for that bad performance is not because scipy do anything wrong, but because the model itself don't represent the data. The maximum likelyhood will work on the mean prevanlently on this case, while least squares will attemp to be near to the curve. That's why gaussian maximum likelyhood perform bad. It doesn't consider all the data, but a few properties of the distribution.

For your problem I would reccomend using a Landau distribution for fitting.

Upvotes: 1

Related Questions