Reputation: 5629
I have a set of points. Basically, I have P = f(t).
I have, let's say, 50 measurements. 50 values of P, function of the time. And these values follow an established law.
What I have to do is to find the values of the parameters in the law, that's all. Basically, I have to fit the points with the best curve. Here is the law:
P = V.t - ((V - W)(1 - exp(-k.t)) / k)
What I need to do is to find a numeric value for V, W and k. I have t and P. Do you have an idea about how to do that?
EDIT:
Here is a screenshot of what I want to obtain:
On the picture:
And that's what I obtained with reptilicus's help:
https://i.sstatic.net/ocvNl.png
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import *
import xlrd
def myFunc(t, V, W, k):
y = V * t - ((V - W) * (1 - np.exp(-k * t)) / k)
return y
classeur = xlrd.open_workbook(path)
names_sheets = classeur.sheet_names()
sheet = classeur.sheet_by_name(names_sheets[0])
row_start = 2
time = sheet.col_values(0, row_start)
fluo = feuille.col_values(4, row_start)
time = [ index for index in time if index ]
fluo = [ index for index in fluo if index ]
# this generates some fake data to fit. For youm just read in the
# data in CSV or whatever you've
x = np.array(time)
y = np.array(fluo)
#fit the data, return the best fit parameters and the covariance matrix
#popt, pcov = curve_fit(myFunc, x, yn)
popt, pcov = curve_fit(myFunc, x, y)
print(popt)
print(pcov)
#plot the data
clf() #matplotlib
plot(x, y, "rs")
#overplot the best fit curve
plot(x, myFunc(x, popt[0], popt[1], popt[2]))
grid(True)
show()
Not bad. I managed to extract the data of my excel workbook, and plotted it. But as you can see, I got a linear regression, what I don't want. My goal is t reproduce the fit they got with Origin 8.
EDIT:
I have some news. The last guy who did that in my team told me how he did with Origin. In fact, they use the least square way as well, but they find the parameters with a chi 2 minimization. The software does some iterations, and optimizes the parameters.
EDIT 2:
Because it took me so long to figure it out, I share here the results of my researches. The main problem I was facing was the fact my values were "too small". Indeed, my y values were of the order of 10^-7. As explained here Fitting curve: why small numbers are better?, numbers of the order of 1 are better for the fit.
Moreover, in my case at least, as my data were of this order, I didn't need to give some initial parameters (by default, they are set to 1). So I just "normalized" my values. For example, I transformed the time values from seconds to hour, and I multiplicated by 10^7 my y values, which were of the order of 10^-7. Then, I transformed back the obtained parameters to get them in the desired unities. Here is my code:
import numpy as np
from scipy.optimize import curve_fit, leastsq
from matplotlib.pyplot import *
def myFunc(t, Vs, Vi, k):
y = Vs * t - ((Vs - Vi) * (1 - np.exp(-k * t)) / k)
return y
raw_x = some_input
raw_y = some_input
# scaling data
time = [ index /3600 for index in raw_x if index or index==0 ]
fluo = [ index*10**7 for index in raw_y if index or index==0 ]
x = np.array(temps)
y = np.array(fluo)
popt, pcov = curve_fit(myFunc, x, y, maxfev=3000)
# Good unities
popt2 = list()
popt2 = [ popt[0] / 3600 * 10**-7, popt[1] / 3600 * 10**-7, popt[2] / 3600 ]
#plot the data
clf() #matplotlib
plot(raw_x, raw_y, "rp")
plot(raw_x, myFunc(raw_x, popt2[0], popt2[1], popt2[2]), 'b')
grid(True)
show()
And here is a picture illustrating the difference:
https://i.sstatic.net/MRvJv.png
The blue plot is the fitting curve using the parameters obtained with the units rescaling (and transformed back in the good unities). The green one is the curve obtained by fitting in the original unities.
Thanks to all of you for your help.
Upvotes: 2
Views: 4320
Reputation: 10407
Just use curve_fit
in scipy.optimize
:
import numpy as np
from scipy.optimize import curve_fit
from pylab import *
def myFunc(t, V, W, k):
y = V * t - ((V - W) * (1 - np.exp(-k * t)) / k)
return y
# this generates some fake data to fit. For youm just read in the
# data in CSV or whatever you've
x = np.linspace(0,4,50)
y = myFunc(x, 2.5, 1.3, 0.5)
# add some noise to the fake data to make it more realistic. . .
yn = y + 0.2*np.random.normal(size=len(x))
#fit the data, return the best fit parameters and the covariance matrix
popt, pcov = curve_fit(myFunc, x, yn)
print popt
print pcov
#plot the data
clf()
plot(x, yn, "rs")
#overplot the best fit curve
plot(x, myFunc(x, popt[0], popt[1], popt[2]))
grid(True)
show()
This gives something like the plot below. The red points are the (noisy) data, and the blue line is the best fit curve, with the best fitting parameters for that particular data of:
[ 2.32751132, 1.27686053, 0.65986596]
Which is pretty close to the expected parameters of 2.5, 1.3, 0.5. The difference is due to the noise that I added in to the fake data.
Upvotes: 6
Reputation: 308968
Since you have more points than unknown coefficients, you'll want to do a least squares fit. This calculation will give you the coefficients that minimize the mean square error between the function and all the points.
So substitute your 50 points into the assumed equation and get 50 equations for 3 coefficients. This is easily expressed as a matrix:
Ax = b
where the unknown vector x
are your coefficients.
Premultiply both sides by the transpose of A
and solve using a matrix solver.
Start by plotting the data you have. The choice of equation you start with will make your job easier or harder. Are you certain about that leading term? If that wasn't appropriate, you could take the natural log of both sides and it's a simple linear equation. The leading term makes that harder to tease out the coefficient for k
. It's a nonlinear solution in that case. Choose wisely.
Upvotes: 0