SDee
SDee

Reputation: 33

Scipy Curve_fit: how would I go about improving this fit?

I've been working on a standard potential which I am trying to fit with a given model: ax2 - bx3 + lx4

The x and y values for the fit are generated from the code as well, the x values are generated by numpy.linspace. I have bounded the a,b,c parameters such that they are always positive. I needed the fit to mimic the data in such a way that at minimum, the height of the local maxima and the position of the global minima are accurate. Instead this is what I'm getting (blue is the actual data and the dashed line is the fitted one with the given model):

Here is the relevant part of my code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.interpolate import *

v =246.
x = np.linspace(0.1, 270.1, 27001)  
Xdata = x/v

def func(x,a,b,l):
    return (a*(x**2)) - (b*(x**3)) + ((l)*(x**4))

temp = np.linspace(80.,80.,1)
b = np.zeros_like(temp)
a = np.zeros_like(temp)
l = np.zeros_like(temp)
Vfit = pot_giver(temp[0])   #External func.
Ydata = (Vfit - Vfit[0])/(pow(v,4))
popt, pcov = curve_fit(func, Xdata, Ydata,bounds=((0.,0.,0.), (np.inf,np.inf,np.inf)))
b[0],a[0],l[0] = popt

plt.plot(Xdata,Ydata)
plt.plot(Xdata, func(Xdata, *popt), 'g--')
plt.show()

I am doing this as an array because I am varying the temp over a wide range, this singular element 'temp' was created for troubleshooting.

The Xdata and Ydata are here on the first and second columns respectively: Data

Upvotes: 3

Views: 121

Answers (3)

Reinderien
Reinderien

Reputation: 15283

In a comment (not in your question), you hinted that the model is fixed. You should re-evaluate that opinion, because adding only one more order produces what's essentially an exact fit:

import numpy as np
from matplotlib import pyplot as plt

x, y = np.loadtxt('79252144.ssv').T

param = np.polynomial.Polynomial.fit(x, y, deg=5)
print(param)

plt.plot(x, y, label='data')
plt.plot(x, param(x), label='fit')
plt.legend()
plt.show()
8.01740618e-05 - 0.00446738 (-1.00074074 + 1.82222222x) -
0.00787199 (-1.00074074 + 1.82222222x)**2 +
0.00078025 (-1.00074074 + 1.82222222x)**3 +
0.00697505 (-1.00074074 + 1.82222222x)**4 +
0.00287214 (-1.00074074 + 1.82222222x)**5

exact fit

Upvotes: 3

lastchance
lastchance

Reputation: 6745

You can also use scipy.optimize.leastsq

Here, you can also impose constraints (through a penalty added to the residuals when they are not satisfied).

You asked for the following constraints:

  • the maximum of the function is correct;

  • the position of the global minimum is correct.

It's a little sensitive to how you impose the constraints, but this just about gets those.

import math
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq

Xdata, Ydata = np.loadtxt( "Pot_data_online.txt", unpack=True )
ymax = Ydata.max()
ymin = Ydata.min()
xymin = Xdata[0]
for x, y in zip( Xdata, Ydata ):
    if y == ymin: xymin = x


# Fitting function
def f( x, a, b, l ):
    return a * x ** 2 - b * x ** 3 + l * x ** 4


# Residuals, including a penalty to get the constraints
def residuals( p, x, y ):
    a, b, l = p
    x1 = ( 3 * b - math.sqrt( max( 9 * b ** 2 - 32 * a * l, 0 ) ) ) / ( 8 * l )
    x2 = ( 3 * b + math.sqrt( max( 9 * b ** 2 - 32 * a * l, 0 ) ) ) / ( 8 * l )
    error = y - f( x, a, b, l )
    penalty = 1e6 * ( abs( f(x1,a,b,l) / ymax - 1 ) + abs( x2 / xymin - 1 ) )
    return abs( error ) + penalty

popt, pcov = leastsq( func=residuals, x0=(0.01,0.3,0.01), args=(Xdata,Ydata) )
a, b, l = popt
x1 = ( 3 * b - math.sqrt( max( 9 * b ** 2 - 32 * a * l, 0 ) ) ) / ( 8 * l )
x2 = ( 3 * b + math.sqrt( max( 9 * b ** 2 - 32 * a * l, 0 ) ) ) / ( 8 * l )
print( "a, b, l = ", a, b, l )
print( "Constraint on maximum:             ymax , fmax  = ", ymax , f( x1, a, b, l ) )
print( "Constraint on position of minimum: xymin, xfmin = ", xymin, x2               )
plt.plot( Xdata,Ydata, 'b-' )
plt.plot( Xdata, f( Xdata, *popt ), 'g--')
plt.show()

Output:

a, b, l =  0.026786218092281242 0.0754627163616853 0.04481075395203686
Constraint on maximum:             ymax , fmax  =  0.0007404005868381408 0.0007404009309691355
Constraint on position of minimum: xymin, xfmin =  0.947308 0.947621778120905

enter image description here

Upvotes: 3

Nick ODell
Nick ODell

Reputation: 25409

I don't believe there is a better solution that curve_fit() cannot find.

Here are a list of things I tried.

Baseline

First, I started with your current solution, and found the MSE (mean squared error) of that solution.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x,a,b,l):
    return (a*(x**2)) - (b*(x**3)) + ((l)*(x**4))

def mse(params, Xdata, Ydata):
    return np.mean((Ydata - func(Xdata, *params)) ** 2)

temp = np.linspace(80.,80.,1)
b = np.zeros_like(temp)
a = np.zeros_like(temp)
l = np.zeros_like(temp)

all_data = np.loadtxt('Pot_data_online.txt')
Xdata = all_data[:, 0]
Ydata = all_data[:, 1]

popt, pcov = curve_fit(func, Xdata, Ydata,bounds=((0.,0.,0.), (np.inf,np.inf,np.inf)))
b[0],a[0],l[0] = popt

plt.plot(Xdata,Ydata, label='data')
plt.plot(Xdata, func(Xdata, *popt), 'g--', label='fit')
print(popt)
plt.legend()
plt.show()

curve_fit_baseline = mse(popt, Xdata, Ydata)

Using different method

I tried changing method to trf, dogbox, and lm. None of these produced significantly better fits.

popt, pcov = curve_fit(
    func,
    Xdata,
    Ydata,
    bounds=((0.,0.,0.), (np.inf,np.inf,np.inf)),
    method='dogbox',
    ftol=1e-15,
    gtol=1e-15,
    xtol=1e-15,
)
fun = mse(popt, Xdata, Ydata)
print("curve_fit score", curve_fit_baseline)
print("dogbox score   ", fun)
print("dogbox improvement (higher is better)", (curve_fit_baseline - fun) / curve_fit_baseline)

Output:

curve_fit score 5.214343649304456e-08
dogbox score    5.2143436493044563e-08
dogbox improvement (higher is better) -1.269084921418044e-16

The improvement is negative, so this means that switching to dogbox was worse than the default method of trf. The lm method cannot tolerate bounds, but if you remove them, it also doesn't do any better. trf can technically do better than default by a factor of 1.26e-14%, but I don't imagine you care about such a small improvement.

Minimizer

Sometimes curve_fit() can be confused by gradients which are too large or small, and a local minimizer can do better.

from scipy.optimize import minimize, basinhopping, Bounds

result = minimize(
    mse,
    np.zeros(3),
    bounds=Bounds((0.,0.,0.), (np.inf,np.inf,np.inf)),
    args=(Xdata, Ydata),
    options=dict(
        ftol=0,
        gtol=0,
    ),
    method='L-BFGS-B',
)

print("curve_fit score", curve_fit_baseline)
print("minimize score ", result.fun)
print("minimize improvement (higher is better)", (curve_fit_baseline - result.fun) / curve_fit_baseline)

This didn't help.

I also tried varying method to COBYQA or SLSQP. This didn't help either.

Global optimization

Sometimes curve_fit() can be unable to deal with local minima. I tried basinhopping to avoid this.

Example:

from scipy.optimize import basinhopping, Bounds

result = basinhopping(
    mse,
    np.zeros(3),
    minimizer_kwargs=dict(
        bounds=Bounds((0.,0.,0.), (np.inf,np.inf,np.inf)),
        args=(Xdata, Ydata),
        options=dict(
            ftol=1e-16,
            gtol=1e-7,
        ),
    ),
    niter=1000,
)
print("curve_fit score   ", curve_fit_baseline)
print("basinhopping score", result.fun)
print("basinhopping improvement (higher is better)", (curve_fit_baseline - result.fun) / curve_fit_baseline)

This also doesn't help.

Non-negative least squares

I tried re-formatting this problem as a linear regression, by pre-processing Xdata to include all of the relevant powers of x. Then, SciPy has a specialized solver for solving non-negative least squares problems.

from sklearn.preprocessing import PolynomialFeatures
from scipy.optimize import nnls

polynomial_transformer = PolynomialFeatures(degree=(2, 4), include_bias=False)
transformed = polynomial_transformer.fit_transform(Xdata.reshape(-1, 1))
# second term has minus sign. Negate feature to make this problem equivalent
transformed[:, 1] *= -1
x, rnorm = nnls(transformed, Ydata)
fun = mse(x, Xdata, Ydata)

print("curve_fit score", curve_fit_baseline)
print("lstsq score    ", fun)
print("lstsq improvement (higher is better)", (curve_fit_baseline - fun) / curve_fit_baseline)

This ties your current solution, to the last decimal. It has the advantage that it is 30x faster than your current solution.

This, in addition to the basinhopping results, also proves that your problem does not contain local minima - nnls cannot get trapped in local minima, so if the problem can be reformatted as one that nnls can solve, then the problem does not contain local minima.

Conclusion

You cannot do any better than the solution that curve_fit() is giving you. You could change the model, but you've already indicated that this is not an option.

Upvotes: 3

Related Questions