Reputation: 9
the fitted curve doesn't fit the datapoints (xH_data, nH_data) as expected. Does someone know what might be the issue here?
from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6
plt.plot(xH_data, nH_data)
plt.yscale("log")
plt.xscale("log")
def eTemp(x, A, a, B):
n = B*(A+x)**a
return n
parameters, covariance = curve_fit(eTemp, xH_data, nH_data, maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]
print(fit_A)
print(fit_a)
print(fit_B)
r = np.logspace(0, 0.7, 1000)
ne = fit_B *(fit_A + r)**(fit_a)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")
Thanks in advance for the help.
Upvotes: 0
Views: 2065
Reputation: 4043
Ok, here is a different approach. As usual, the main problem are initial guesses for the non linear fit (For details, check this). Here, those are evaluated by using an integro relation of the fit function y( x ) = a ( x - c )^p
, namely int( y ) = ( x - c ) / ( p + 1 ) y + d = x y / ( p + 1 ) - c y / ( p + 1 ) + d
This means we can get c
and p
via a linear fit of int y
against x y
and y
. Once those are known, a
is a simple linear fit. It will turn out that these guesses are already quite good. Nevertheless, those will go as initial values into a non-linear fit providing the final result. In detail this goes like this:
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from scipy.optimize import curve_fit
xHdata = np.array(
[
1.0, 1.03, 1.06, 1.1, 1.2, 1.3, 1.5,
1.7, 2.0, 2.6, 3.0, 4.0, 5.0, 6.0
]
)
nHdata = np.array(
[
403.0, 316.0, 235.0, 160.0, 70.8, 37.6,
14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029
]
) * 1.0e6
def fit_func( x, a, c, p ):
out = a * ( x - c )**p
return out
### fitting the non-linear parameters as part of an integro-equation
### this is the standard matrix formulation of a linear fit
Sy = cumtrapz( nHdata, x=xHdata, initial=0 ) ## int( y )
VMXT = np.array( [ xHdata * nHdata , nHdata, np.ones( len( nHdata ) ) ] ) ## ( x y, y, d )
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, Sy )
sol = np.linalg.solve( A , SV )
print ( sol )
pF = 1 / sol[0] - 1
print( pF )
cF = -sol[1] * ( pF + 1 )
print( cF )
### making a linear fit on the scale
### the short version of the matrix form if only one factor is calculated
fk = fit_func( xHdata, 1, cF, pF )
aF = np.dot( nHdata, fk ) / np.dot( fk, fk )
print( aF )
#### using these guesses as input for a final non-linear fit
sol, cov = curve_fit(fit_func, xHdata, nHdata, p0=( aF, cF, pF ) )
print( sol )
print( cov )
### plotting
xth = np.linspace( 1, 6, 125 )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( xHdata, nHdata )
ax.plot( xth, fit_func( xth, aF, cF, pF ), ls=':' )
ax.plot( xth, fit_func( xth, *sol ) )
plt.show()
Providing:
[-3.82334284e-01 2.51613126e-01 5.41522867e+07]
-3.6155122388787175
0.6580972107001803
8504146.59883185
[ 5.32486242e+07 2.44780953e-01 -7.24897172e+00]
[[ 1.03198712e+16 -2.71798924e+07 -2.37545914e+08]
[-2.71798924e+07 7.16072922e-02 6.26461373e-01]
[-2.37545914e+08 6.26461373e-01 5.49910325e+00]]
(note the high correlation from a
to c
and p
)
and
Upvotes: 1
Reputation: 1
There is a problem with your fit equation. If A is less than -1 and your a parameter is negative then you get an imaginary value for your function within your fit range. For this reason you need to add constraints and an initial set of parameters to your curve_fit function for example:
parameters, covariance = curve_fit(eTemp, xH_data, nH_data, method='dogbox', p0 = [100, -3.3, 10E8], bounds=((-0.9, -10, 0), (200, -1, 10e9)), maxfev=200000)
You need to change the method to 'dogbox' in order to perform this fit with the constraints.
Upvotes: 0
Reputation: 1275
I think you're nearly there, just need to fit on a log scale and throw in a decent guess. To make the guess you just need to throw in a plot like
plt.figure()
plt.plot(np.log(xH_data), np.log(nH_data))
and you'll see it's nearly linear. So your B will be the exponentiated intercept (i.e. exp(20ish)) and the a is the approximate slope (-5ish). A is weird one, does it have some physical meaning or you just threw it in there? If there's no physical meaning, I'd say get rid of it.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6
def eTemp(x, A, a, B):
logn = np.log(B*(x + A)**a)
return logn
parameters, covariance = curve_fit(eTemp, xH_data, np.log(nH_data), p0=[np.exp(0.1), -5, np.exp(20)], maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]
print(fit_A)
print(fit_a)
print(fit_B)
r = np.logspace(0, 0.7, 1000)
ne = np.exp(eTemp(r, fit_A, fit_a, fit_B))
plt.plot(xH_data, nH_data)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")
Upvotes: 0
Reputation: 577
I know of two things that might help you
p0
input parameter to curve_fit
with a set of appropriate starting parameters to the function. That can keep the algorithm from running wild.np.log(n)
and then make the fit to np.log(nH_data)
. As it is now, there is a far larger penalty for not fitting the first data points than for not fitting the last data points, as the values are about 10^2 larger for the first ones. Thus, the first data points become "more important" to fit for the algorithm. Taking the logarithm puts them more on the same scale, so that points are weighed equally.Go ahead and play around with it. I managed a pretty fine fit with these parameters
[-7.21450545e-01 -3.36131028e+00 5.97293632e+06]
Upvotes: 0