Watts
Watts

Reputation: 3

Loglikelihood of normal distribution

I'm trying to find maximum likelihood estimate of mu and sigma from normal distribution using minimize function form scipy. However minimazation returns expected value of mean but estimate of sigma is far from real sigma.

I define function llnorm that returns negative log-likelihood of normal distribution, then create random sample from normal distribution with mean 150 and standard deviation 10, then using optimize I am trying to find MLE.

import numpy as np
import math
import scipy.optimize as optimize

def llnorm(par, data):
    n = len(data)
    mu, sigma = par
    ll = -np.sum(-n/2 * math.log(2*math.pi*(sigma**2)) - ((data-mu)**2)/(2 * (sigma**2)))
    return ll

data = 10 * np.random.randn(100) + 150

result = optimize.minimize(llnorm, [150,10], args = (data))

Even though mean of data is close to 150 and std is close to 10, optimazation returns much smaller value of estimated sigma (close to 0).

Upvotes: 0

Views: 10813

Answers (3)

rahoo
rahoo

Reputation: 110

You can also use scipy logpdf:

from scipy.stats import norm
norm(mu, sigma).logpdf(samples).sum()

Upvotes: 2

Aditya Santoso
Aditya Santoso

Reputation: 1071

np.random.randn creates random Gaussian distribution with variance 1 (docs here). Since you aim to have distribution with std of 10, you need to multiply with 10 * 10 instead

import numpy as np
import math
import scipy.optimize as optimize

def llnorm(par, data):
    n = len(data)
    mu, sigma = par
    ll = -np.sum(-n/2 * math.log(2*math.pi*(sigma**2)) - ((data-mu)**2)/(2 * (sigma**2)))
    return ll

data = 10 * 10 * np.random.randn(100) + 150 

result = optimize.minimize(llnorm, [150,10], args = (data))
print(result)

This gives me:

      fun: 36328.17002555693
 hess_inv: array([[ 0.96235834, -0.32116447],
       [-0.32116447,  0.10879383]])
      jac: array([0., 0.])
  message: 'Optimization terminated successfully.'
     nfev: 44
      nit: 8
     njev: 11
   status: 0
  success: True
        x: array([166.27014352,   9.15113937])

EDIT: it seems like the output of ~9 is purely coincidental. Something else needs to be investigated

Upvotes: -1

Julien
Julien

Reputation: 15206

Your math is slightly off:

ll = n*math.log(2*math.pi*(sigma**2))/2 + np.sum(((data-mu)**2)/(2 * (sigma**2)))

or

ll = np.sum(math.log(2*math.pi*(sigma**2))/2 + ((data-mu)**2)/(2 * (sigma**2)))

First I cancel the -'s (not a problem), but above all either you keep the constant term in the sum and don't multiply it by n, or you take it out and multiply it by n,... but not both at the same time.

Upvotes: 5

Related Questions