casella
casella

Reputation: 11

exponential curve fit parameters in python do not make sense--fit itself looks great

I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.

The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.

here is the plot (https://i.sstatic.net/fBfg7.png)](https://i.sstatic.net/fBfg7.png)

here is the experimental raw data I used: https://pastebin.com/CR2BCJji

xdata = cfu_u
ydata = OD_u


min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))

def func(x,a, b, c):
  return (a*x)**b + c*x 

popt, pcov = curve_fit(func, xdata, ydata)

plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()


print(popt)
[ 1.44930871e+03  1.00000000e+00 -1.44930871e+03]

I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.

However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.

Upvotes: 0

Views: 320

Answers (2)

Rory Yorke
Rory Yorke

Reputation: 2236

When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)

enter image description here

Code:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T

# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]

def func(x, a, b, c):
  return (a*x)**b + c*x 

popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])

x_vec = np.geomspace(xdata.min(), xdata.max())

plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()

print(popt)
#[ 1.44930871e+03  1.00000000e+00 -1.44930871e+03]

Upvotes: 0

Reinderien
Reinderien

Reputation: 15308

Modify your model to show what's actually happening:

def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
    return (a*x)**(1 - b) + (c - a)*x

producing optimized parameters

[3.49003332e-04 6.60420171e-06 3.13366557e-08]

This is likely to be numerically unstable. Try optimizing in the log domain instead.

Upvotes: 1

Related Questions