user2967087
user2967087

Reputation: 145

Using scipy optimize for MLE estimate and curve fitting

I randomly generated 1000 data points using the weights I know are true for the normal distribution. Now I am trying to minimize the -log likelihood function to estimate the values of sig^2 and the weights. I sort of get the process conceptually, but when I try to code it I'm just lost.

This is my model:

p(y|x, w, sig^2) = N(y|w0+w1x+...+wnx^n, sig^2)

I've been googling for a while now and I've learned the scipy.stats.optimize.minimize function is good for this, but I can't get it to work right. Every solution I have tried has worked for the example I got the solution from, but I'm unable to extrapolate it to my problem.

x = np.linspace(0, 1000, num=1000)
data = []
for y in x:
        data.append(np.polyval([.5, 1, 3], y))

#plot to confirm I do have a normal distribution...
data.sort()
pdf = stats.norm.pdf(data, np.mean(data), np.std(data))
plt.plot(test, pdf)
plt.show()

#This is where I am stuck.
logLik = -np.sum(stats.norm.logpdf(data, loc=??, scale=??))

I have found that the equation error(w) = .5*sum(poly(x_n, w) - y_n)^2 is relevant for minimizing the error of the weights, which therefore maximizes my likelihood for the weights, but I don't understand how to code this... I have found a similar relationship for sig^2, but have the same problem. Can somebody clarify how to do this to help my curve fitting? Maybe go as far to post psuedo code I can use?

Upvotes: 1

Views: 5372

Answers (2)

tBuLi
tBuLi

Reputation: 2325

Yes, implementing likelihood fitting with minimize is tricky, I spend a lot of time on it. Which is why I wrapped it. If I may shamelessly plug my own package symfit, your problem can be solved by doing something like this:

from symfit import Parameter, Variable, Likelihood, exp
import numpy as np

# Define the model for an exponential distribution
beta = Parameter()
x = Variable()
model = (1 / beta) * exp(-x / beta)

# Draw 100 samples from an exponential distribution with beta=5.5
data = np.random.exponential(5.5, 100)

# Do the fitting!
fit = Likelihood(model, data)
fit_result = fit.execute()

I have to admit I don't exactly understand your distribution, since I don't understand the role of your w, but perhaps with this code as an example, you'll know how to adapt it.

If not, let me know the full mathematical equation of your model so I can help you further.

For more info check the docs. (For a more technical description of what happens under the hood, read here and here.)

Upvotes: 3

cd98
cd98

Reputation: 3532

I think there's an issue with your setup. With maximum likelihood, you obtain the parameters that maximize the probability of observing your data (given a certain model). Your model seems to be:

enter image description here

where epsilon is N(0, sigma).

So you maximize it:

enter image description here

or equivalently take logs to get:

enter image description here

The f in this case is the log-normal probability density function which you can get with stats.norm.logpdf. You should then use scipy.minimize to maximize an expression that will be the summation of stats.norm.logpdf evaluated at each of the i points, from 1 to your sample size.

If I've understood you correctly, your code is missing having a y vector plus an x vector! Show us a sample of those vectors and I can update my answer to include a sample code for estimating MLE with that date.

Upvotes: 1

Related Questions