Reputation: 31
I am using pandas to read in .csv files. I then take the x and y pairs from the dataframe and use symfit
to perform a global fit on the data. I am new to pandas dataframes and to symfit
. My current proof-of-concept code works for two data sets, but I want to write it in a way that will work for however many data sets are imported from the original .csv
file, which will always be in the same format--columns will always be pairs of x
and y
values in the format x1, y1, x2, y2,
etc.
Can I iterate through the dataframe and pull out individual arrays for x1, y1, x2, y2,
etc.? Does that defeat the purpose of using a dataframe?
# creating the dataframe
from pandas import read_csv, Series, DataFrame, isnull
data_file = read_csv(filename, header=None, skiprows=2) # no data in first two rows--these contain information I use later on for plotting
# important note: data sets contain different numbers of points, so pandas reads in nan for any missing values.
X1 = Series(data_file[0]).values
X1 = x_1[~isnull(x_1)] # removes any nan values (up for any suggestions on a better way to do this. Other methods I have tried remove entire rows or columns that contain nan)
Y1 = Series(data_file[1]).values
Y1 = y_1[~isnull(y_1)]
X2 = Series(data_file[2]).values
X2 = x_2[~isnull(x_2)]
Y2 = Series(data_file[3]).values
Y2 = y_2[~isnull(y_2)]
# sample data
# X1 = [12.5, 6.7, 5, 3.1, 128, 47, 5, 3.1, 6.7, 12.5]
# Y1 = [280, 150, 127, 85, 400, 401, 110, 96, 131, 241]
# X2 = [75, 39, 10, 7.7, 19, 39, 75]
# Y2 = [296, 257, 141, 100, 181, 254, 324]
From here I pass the X and Y's to a class that contains symfit's model and fitting functions. I don't think I can concatenate X and Y; I need them to stay separate so symfit will fit separate curves for each data set (with four shared parameters).
Below is the model I am using. I might be butchering symfit's syntax. I'm still learning about symfit, but it's been wonderful so far. This fit works for two data sets, and I'm able to extract the fit parameters and plot the results later on.
# This model assumes two data sets. I need to figure out how to fit as many as 10 data sets.
from symfit import parameters, variables, Fit, Model
fi_1 = 0 # These parameters change with each x,y pair. These will also be read from the original data file. I have them hard-coded here for ease.
fi_2 = 1
x_1, x_2, y_1, y_2 = variables('x_1, x_2, y_1, y_2')
vmax, km, evk, ev = parameters('vmax, km, evk, ev') # these are all shared
model = Model({
y_1: vmax * x_1 / (km * (1 + (fi_1 * evk)) + x_1 * (1 + (fi_1 * ev))),
y_2: vmax * x_2 / (km * (1 + (fi_2 * evk)) + x_2 * (1 + (fi_2 * ev)))})
fit = Fit(model, x_1=X1, x_2=X2, y_1=Y1, y_2=Y2)
fit_result = fit.execute()
PROBLEM SUMMARY: I could have as many as 10 x, y pairs to fit simultaneously. Is there a clean way to iterate through the dataframe so I avoid hard-coding the x and y arrays that are passed on to symfit?
Upvotes: 2
Views: 2135
Reputation: 31
It turns out that it was a LOT easier than I thought. I am able to restructure the input .csv file so that there is one column for x values, one for y values, and one for fi, the parameter that changes between data sets. So all the x,y pairs that belong together have a corresponding value of fi. For example, fi = 0 for all the x,y pairs in the first data set, and as soon as the second data set begins, fi = 1. I can expand it nicely for any number of x,y pairs with a different value for fi. Now I'm able to use the dataframe efficiently:
data_file = read_csv(filename, header=None, skiprows=1) #first row contains column labels now
Here's the simplified model:
x, y, fi = variables('x, y, fi') # set variables
vmax, km, evk, ev = parameters('vmax, km, evk, ev') # set shared parameters
model = Model({y: vmax * x / (km * (1 + (fi * evk)) + x *(1 + (fi * ev)))})
fit = Fit(model, x=data_file[0], y=data_file[1], fi=data_file[2])
fit_result = fit.execute()
This works and is much cleaner than what I thought it would end up being. Restructuring the input files to simplify data import helps a lot!
Upvotes: 1