Rewriting SAS optimization in Python

Question

I am trying to rewrite something similar to the following SAS optimization code in Python. The goal of the code is to find parameters for a continuous (in this case normal) distribution that best fits some empirical data. I am familiar with Python but very new to SAS.

proc nlin data=mydata outest=est METHOD=MARQUARDT SMETHOD=GOLDEN SAVE;
    parms m = -1.0 to 1.0 by 0.1
    s = 1.0 to 2.0 by 0.01;
    bounds -10


In Python, I have something set up like this:
import pandas as pd
from scipy import stats
from scipy.optimize import minimize

def distance_empirical_fitted_cdf( params: list, data: pd.Series ):
    m,s = params
    empirical_cdf = ( data / 100 ).cumsum()
    cost = 0
    for point in range( 10 ):
        emprical_cdf_at_point = empirical_cdf.iloc[ point ]
        fitted_cdf_at_point = stats.norm.cdf( x = point, loc = m, scale = s )
        cost += ( fitted_cdf_at_point - empirical_cdf_at_point ) ** 2 
    return cost

result = minimize( distance_empirical_fitted_cdf, x0=[0,1.5], args=(distribution),
                    bounds=[(-10,10),(0,10)] )
fitted_m, fitted_s = result.x

The code I have now gets me fairly close to the existing code's output in most cases, but not in all. Ideally, I could get them to match or be as close as possible and understand why they don't.
As far as I can tell, there are two sources of discrepancy. First, the SAS code is able to take a set of possible starting values (in this case a range from -1 to 1 for m and 0 to 10 for s) to initialize the parameters. Is there an equivalent of this in Python?
Second, the SAS code is specifically using the Marquardt optimization method and the Golden step size search method. The only Python code I could find referencing the Marquardt method is scipy.optimize.least_squares with method="lm", but this doesn't support bounds (and is much further off compared to scipy.optimize.minimize when I try without bounds).
The only Python code I could find referencing the golden step size search method is scipy.optimize.golden, but the documentation says that this is for minimizing functions of one variable and doesn't seem to support bounds either.
Any insight on getting the Python output closer to the SAS output would be greatly appreciated, thanks!

Rewriting SAS optimization in Python

Answers (1)

Related Questions