Francesco Azzollini
Francesco Azzollini

Reputation: 9

Fitting a curve to some datapoints

the fitted curve doesn't fit the datapoints (xH_data, nH_data) as expected. Does someone know what might be the issue here?

from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6

plt.plot(xH_data, nH_data)
plt.yscale("log")
plt.xscale("log")

def eTemp(x, A, a, B):
    n =  B*(A+x)**a 
    return n

parameters, covariance = curve_fit(eTemp, xH_data, nH_data, maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]



print(fit_A)
print(fit_a)
print(fit_B)


r = np.logspace(0, 0.7, 1000)
ne =   fit_B *(fit_A + r)**(fit_a)
plt.plot(r, ne)

plt.yscale("log")
plt.xscale("log")

Thanks in advance for the help.

Upvotes: 0

Views: 2065

Answers (4)

mikuszefski
mikuszefski

Reputation: 4043

Ok, here is a different approach. As usual, the main problem are initial guesses for the non linear fit (For details, check this). Here, those are evaluated by using an integro relation of the fit function y( x ) = a ( x - c )^p, namely int( y ) = ( x - c ) / ( p + 1 ) y + d = x y / ( p + 1 ) - c y / ( p + 1 ) + d This means we can get c and p via a linear fit of int y against x y and y. Once those are known, a is a simple linear fit. It will turn out that these guesses are already quite good. Nevertheless, those will go as initial values into a non-linear fit providing the final result. In detail this goes like this:

import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from scipy.optimize import curve_fit

xHdata = np.array(
    [
        1.0, 1.03, 1.06, 1.1, 1.2, 1.3, 1.5,
        1.7, 2.0, 2.6, 3.0, 4.0, 5.0, 6.0
    ]
)
nHdata = np.array(
    [
        403.0, 316.0, 235.0, 160.0, 70.8, 37.6,
        14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029
    ]
) * 1.0e6

def fit_func( x, a, c, p ):
    out =  a * ( x - c )**p 
    return out

### fitting the non-linear parameters as part of an integro-equation
### this is the standard matrix formulation of a linear fit
Sy = cumtrapz( nHdata, x=xHdata, initial=0 ) ## int( y )
VMXT = np.array( [  xHdata * nHdata , nHdata, np.ones( len( nHdata ) ) ] ) ## ( x y, y, d )
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, Sy )
sol = np.linalg.solve( A , SV )
print ( sol )
pF = 1 / sol[0] - 1
print( pF )
cF = -sol[1] * ( pF + 1 ) 
print( cF )
### making a linear fit on the scale
### the short version of the matrix form if only one factor is calculated
fk = fit_func( xHdata, 1, cF, pF )
aF = np.dot( nHdata, fk ) / np.dot( fk, fk )
print( aF )

#### using these guesses as input for a final non-linear fit
sol, cov = curve_fit(fit_func, xHdata, nHdata, p0=( aF, cF, pF ) )
print( sol )
print( cov )

### plotting 
xth = np.linspace( 1, 6, 125 )

fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( xHdata, nHdata )
ax.plot( xth, fit_func( xth, aF, cF, pF ), ls=':' )
ax.plot( xth, fit_func( xth, *sol ) )
plt.show()

Providing:

[-3.82334284e-01  2.51613126e-01  5.41522867e+07]
-3.6155122388787175
0.6580972107001803
8504146.59883185
[ 5.32486242e+07  2.44780953e-01 -7.24897172e+00]
[[ 1.03198712e+16 -2.71798924e+07 -2.37545914e+08]
 [-2.71798924e+07  7.16072922e-02  6.26461373e-01]
 [-2.37545914e+08  6.26461373e-01  5.49910325e+00]]

(note the high correlation from a to c and p) and data, guess, and non-linear fit

Upvotes: 1

Henry
Henry

Reputation: 1

There is a problem with your fit equation. If A is less than -1 and your a parameter is negative then you get an imaginary value for your function within your fit range. For this reason you need to add constraints and an initial set of parameters to your curve_fit function for example:

parameters, covariance = curve_fit(eTemp, xH_data, nH_data, method='dogbox', p0 = [100, -3.3, 10E8], bounds=((-0.9, -10, 0), (200, -1, 10e9)), maxfev=200000)

You need to change the method to 'dogbox' in order to perform this fit with the constraints.

Upvotes: 0

Andrew Holmgren
Andrew Holmgren

Reputation: 1275

I think you're nearly there, just need to fit on a log scale and throw in a decent guess. To make the guess you just need to throw in a plot like

plt.figure()
plt.plot(np.log(xH_data), np.log(nH_data))

and you'll see it's nearly linear. So your B will be the exponentiated intercept (i.e. exp(20ish)) and the a is the approximate slope (-5ish). A is weird one, does it have some physical meaning or you just threw it in there? If there's no physical meaning, I'd say get rid of it.

from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6

def eTemp(x, A, a, B):
    logn =  np.log(B*(x + A)**a)
    return logn

parameters, covariance = curve_fit(eTemp, xH_data, np.log(nH_data), p0=[np.exp(0.1), -5, np.exp(20)], maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]

print(fit_A)
print(fit_a)
print(fit_B)

r = np.logspace(0, 0.7, 1000)
ne = np.exp(eTemp(r, fit_A, fit_a, fit_B))
plt.plot(xH_data, nH_data)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")

Upvotes: 0

eandklahn
eandklahn

Reputation: 577

I know of two things that might help you

  1. Provide the p0 input parameter to curve_fit with a set of appropriate starting parameters to the function. That can keep the algorithm from running wild.
  2. Change the function you are fitting so that it returns np.log(n) and then make the fit to np.log(nH_data). As it is now, there is a far larger penalty for not fitting the first data points than for not fitting the last data points, as the values are about 10^2 larger for the first ones. Thus, the first data points become "more important" to fit for the algorithm. Taking the logarithm puts them more on the same scale, so that points are weighed equally.

Go ahead and play around with it. I managed a pretty fine fit with these parameters

[-7.21450545e-01 -3.36131028e+00  5.97293632e+06]

Upvotes: 0

Related Questions