user2986671
user2986671

Reputation: 33

Implementation of Stata's "nl" package (nonlinear least squares) in Python

Does anybody know if there is an implementation of the Stata nl (nonlinear least squares) package in Python? I tried to use lmfit as well as optimize.leastsq from scipy, but both do not seem to work.

The equation for the regression is

Y = x1 + b1 + 0.3*log(x2-b2)*b3 - 0.7*x3*b3 + b5*x2

where Y is the dependent variable, the x's are the independent variables, and the b's are the coefficients to be estimated.

Using the lmfit package, i tried the following:


from lmfit import minimize, Parameters, Parameter, report_fit
import pandas as pd
import numpy as np

inputfile = "testdata.csv"
df = pd.read_csv(inputfile)

x1= df['x1']
x2 = df['x2']
x3= df['x3']
y= df['y']


def fcn2min(params, x1, x2, x3, y):

    b1 = params['b1'].value
    b2 = params['b2'].value
    b3 = params['b3'].value
    b5 = params['b5'].value
    model = x1 + b1 + (0.3)*np.log(x2-b2)*b3 - (0.7)*x3*b3 + b5*x2
    return model - y

params = Parameters()

params.add('b1', value= 10)

params.add('b2', value= 1990)

params.add('b3', value= 5)

params.add('b5', value= 12)


result = minimize(fcn2min, params, args=(x1, x2, x3, y))

print report_fit(result) 

As a result, all parameters are estimated to be NaN. Can anybody explain what I did wrong? Or, is there a good implementation of Stata's nl function in Python? Many thanks!

Here's the data in the CSV file:


x1,x2,x3,y
1981,15.2824955,14.56475067,2.936807632
1982,15.2635746,15.52343941,2.908272743
1983,15.30461597,16.30871582,2.940227509
1984,15.37490845,16.76519966,3.001846313
1985,15.41295338,17.04235458,3.030970573
1986,15.44680405,17.25271797,3.055702209
1987,15.48135281,17.44781876,3.081344604
1988,15.52259159,17.62217331,3.113491058
1989,15.5565939,17.71343422,3.138068199
1990,15.57392025,17.81187439,3.144176483
1991,15.57197666,17.89474106,3.128887177
1992,15.60479259,17.98217583,3.14837265
1993,15.63134575,18.06685829,3.161927223
1994,15.67116165,18.16578865,3.18959713
1995,15.69621944,18.27449799,3.202876091
1996,15.7329874,18.38712311,3.228042603
1997,15.77698135,18.50685883,3.260077477
1998,15.81788635,18.63579178,3.289312363
1999,15.86141682,18.76427078,3.321393967
2000,15.89737129,18.89691544,3.34650898
2001,15.90485096,18.99729347,3.344522476
2002,15.92070866,19.06253433,3.351119995


Upvotes: 0

Views: 321

Answers (1)

M Newville
M Newville

Reputation: 7862

Just to set the record straight, the reason for the failure here is because you are not checking for the case that x2-b2 might be negative, so that np.log(x2-b2) is NaN. Certainly, if the objective function returns NaN, the fit is going to stop and not be able to find a good solution. You could try adding an upper bound on b2. Like others, I suspect that if you are guessing b1 to be 10 and b2 to be 1990 that you have some simple error in your objective function that is causing NaN to occur. It's often good to call the objective function once, and maybe even plot the starting condition.

Or you can blame the tool.

Upvotes: 1

Related Questions