MML
MML

Reputation: 31

How can I extract x and y pairs from pandas dataframe to then use in symfit?

I am using pandas to read in .csv files. I then take the x and y pairs from the dataframe and use symfit to perform a global fit on the data. I am new to pandas dataframes and to symfit. My current proof-of-concept code works for two data sets, but I want to write it in a way that will work for however many data sets are imported from the original .csv file, which will always be in the same format--columns will always be pairs of x and y values in the format x1, y1, x2, y2, etc.

Can I iterate through the dataframe and pull out individual arrays for x1, y1, x2, y2, etc.? Does that defeat the purpose of using a dataframe?

    # creating the dataframe

        from pandas import read_csv, Series, DataFrame, isnull

        data_file = read_csv(filename, header=None, skiprows=2) # no data in first two rows--these contain information I use later on for plotting

    # important note: data sets contain different numbers of points, so pandas reads in nan for any missing values.

        X1 = Series(data_file[0]).values
        X1 = x_1[~isnull(x_1)] # removes any nan values (up for any suggestions on a better way to do this. Other methods I have tried remove entire rows or columns that contain nan)

        Y1 = Series(data_file[1]).values
        Y1 = y_1[~isnull(y_1)]

        X2 = Series(data_file[2]).values
        X2 = x_2[~isnull(x_2)]

        Y2 = Series(data_file[3]).values
        Y2 = y_2[~isnull(y_2)]

    # sample data 
    # X1 = [12.5, 6.7, 5, 3.1, 128, 47, 5, 3.1, 6.7, 12.5]
    # Y1 = [280, 150, 127, 85, 400, 401, 110, 96, 131, 241]
    # X2 = [75, 39, 10, 7.7, 19, 39, 75]
    # Y2 = [296, 257, 141, 100, 181, 254, 324] 

From here I pass the X and Y's to a class that contains symfit's model and fitting functions. I don't think I can concatenate X and Y; I need them to stay separate so symfit will fit separate curves for each data set (with four shared parameters).

Below is the model I am using. I might be butchering symfit's syntax. I'm still learning about symfit, but it's been wonderful so far. This fit works for two data sets, and I'm able to extract the fit parameters and plot the results later on.

    # This model assumes two data sets. I need to figure out how to fit as many as 10 data sets.

        from symfit import parameters, variables, Fit, Model

        fi_1 = 0 # These parameters change with each x,y pair. These will also be read from the original data file. I have them hard-coded here for ease. 
        fi_2 = 1

        x_1, x_2, y_1, y_2 = variables('x_1, x_2, y_1, y_2')

        vmax, km, evk, ev = parameters('vmax, km, evk, ev') # these are all shared

        model = Model({
            y_1: vmax * x_1 / (km * (1 + (fi_1 * evk)) + x_1 * (1 + (fi_1 * ev))),
            y_2: vmax * x_2 / (km * (1 + (fi_2 * evk)) + x_2 * (1 + (fi_2 * ev)))})

        fit = Fit(model, x_1=X1, x_2=X2, y_1=Y1, y_2=Y2)
        fit_result = fit.execute()

PROBLEM SUMMARY: I could have as many as 10 x, y pairs to fit simultaneously. Is there a clean way to iterate through the dataframe so I avoid hard-coding the x and y arrays that are passed on to symfit?

Upvotes: 2

Views: 2135

Answers (1)

MML
MML

Reputation: 31

It turns out that it was a LOT easier than I thought. I am able to restructure the input .csv file so that there is one column for x values, one for y values, and one for fi, the parameter that changes between data sets. So all the x,y pairs that belong together have a corresponding value of fi. For example, fi = 0 for all the x,y pairs in the first data set, and as soon as the second data set begins, fi = 1. I can expand it nicely for any number of x,y pairs with a different value for fi. Now I'm able to use the dataframe efficiently:

data_file = read_csv(filename, header=None, skiprows=1) #first row contains column labels now

Here's the simplified model:

x, y, fi = variables('x, y, fi') # set variables
vmax, km, evk, ev = parameters('vmax, km, evk, ev') # set shared parameters

model = Model({y: vmax * x / (km * (1 + (fi * evk)) + x *(1 + (fi * ev)))})

fit = Fit(model, x=data_file[0], y=data_file[1], fi=data_file[2])

fit_result = fit.execute()

This works and is much cleaner than what I thought it would end up being. Restructuring the input files to simplify data import helps a lot!

Upvotes: 1

Related Questions