Milán Janosov
Milán Janosov

Reputation: 86

Scipy MLE fit of a normal distribution

I was trying to adopt this solution proposed in this thread to determine the parameters of a simple normal distribution. Even though the modifications are minor (based on wikipedia), the result is pretty off. Any suggestion where it goes wrong?

import math
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

def gaussian(x, mu, sig):
    return 1./(math.sqrt(2.*math.pi)*sig)*np.exp(-np.power((x - mu)/sig, 2.)/2)

def lik(parameters):

    mu    = parameters[0]
    sigma = parameters[1]    
    n     = len(x)  
    L     = n/2.0 * np.log(2 * np.pi) + n/2.0 * math.log(sigma **2 ) + 1/(2*sigma**2) * sum([(x_ - mu)**2 for x_ in x ])

    return L



mu0    = 10
sigma0 = 2

x = np.arange(1,20, 0.1)
y = gaussian(x, mu0, sigma0)


lik_model = minimize(lik, np.array([5,5]), method='L-BFGS-B')


mu    = lik_model['x'][0]
sigma = lik_model['x'][1]

print lik_model

plt.plot(x, gaussian(x, mu, sigma), label = 'fit')
plt.plot(x, y, label = 'data')
plt.legend()

Output of the fit:

jac: array([2.27373675e-05, 2.27373675e-05])

message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

success: True

x: array([10.45000245, 5.48475283])

curve and fit

Upvotes: 1

Views: 7863

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114781

The maximum likelihood method is for fitting the parameters of a distribution to a set of values that are purportedly a random sample from that distribution. In your lik function, you use x to hold the sample, but x is a global variable that you have set to x = np.arange(1,20, 0.1). That is definitely not a random sample from a normal distribution.

Because you are using the normal distribution, you can use the known formulas for the maximum likelihood estimate to check your computation: mu is the sample mean, and sigma is the sample standard deviation:

In [17]: x.mean()
Out[17]: 10.450000000000006

In [18]: x.std()
Out[18]: 5.484751589634671

Those value matches the result of your call to minimize pretty closely, so it looks like your code is working.

To modify your code to use MLE in the way you expected it to work, x should be a collection of values that are purportedly a random sample from a normal distribution. Note that your array y is not such a sample. It is the value of the probability density function (PDF) on a grid. If fitting the distribution to a sample of the PDF is your actual goal, you can use an curve-fitting function such as scipy.optimize.curve_fit. If fitting the normal distribution parameters to a random sample is, in fact, what you want to do, then to test your code, you should use an input that is a reasonably large sample from a distribution with known parameters. In this case, you can do

x = np.random.normal(loc=mu0, scale=sigma0, size=20)

When I use such an x in your code, I get

In [20]: lik_model.x
Out[20]: array([ 9.5760996 ,  2.01946582])

As expected, the values in the solution are approximately 10 and 2.

(If you use x for your sample as I did, you'll have to change your plotting code accordingly.)

Upvotes: 5

Related Questions