pamplemoose
pamplemoose

Reputation: 25

Apply curve_fit on dataframe columns

I have a pandas.DataFrame with with multiple columns and I would like to apply a curve_fit function to each of them. I would like the output to be a dataframe with the optimal values fitting the data in the columns (for now, I am not interested in their covariance).

The df has the following structure:

    a  b  c
0   0  0  0
1   0  0  0
2   0  0  0
3   0  0  0
4   0  0  0
5   0  0  0
6   1  0  1
7   1  1  1
8   1  1  1
9   1  1  1
10  1  1  1
11  1  1  1
12  1  1  1
13  1  1  1
14  2  1  2
15  6  2  6
16  7  2  7
17  8  2  8
18  9  2  9
19  7  2  7

I have defined a function to fit to the data as so:

def sigmoid(x, a, x0, k):
     y = a / (1 + np.exp(-k*(x-x0)))
     return y
 
def fitdata(dataseries):
    popt, pcov=curve_fit(sigmoid, dataseries.index, dataseries)
    return popt

I can apply the function and get an array in return:

result_a=fitdata(df['a'])
In []: result_a
Out[]: array([  8.04197008,  14.48710063,   1.51668241])

If I try to df.apply the function I get the following error:

fittings=df.apply(fitdata)
ValueError: Shape of passed values is (3, 3), indices imply (3, 20)

Ultimately I would like the output to look like:

           a          b          c
0   8.041970   2.366496   8.041970
1  14.487101  12.006009  14.487101
2   1.516682   0.282359   1.516682

Can this be done with something similar to apply?

Upvotes: 1

Views: 3871

Answers (3)

kerel
kerel

Reputation: 9

(this post is based on both previous answers and provides a complete example including an improvement in the dataframe construction of the fit parameters)

The following function fit_to_dataframe fits an arbitrary function to each column in your data and returns the fit parameters (covariance ignored here) in a convenient format:

def fit_to_dataframe(df, function, parameter_names):
    popts = {}
    pcovs = {}

    for c in df.columns:
        popts[c], pcovs[c] = curve_fit(function, df.index, df[c])

    fit_parameters = pd.DataFrame.from_dict(popts,
                                            orient='index',
                                            columns=parameter_names)
    return fit_parameters

fit_parameters = fit_to_dataframe(df, sigmoid, parameter_names=['a', 'x0', 'k'])

The fit parameters are available in the following form:

          a         x0         k
a  8.869996  11.714575  0.844969
b  2.366496  12.006009  0.282359
c  8.041970  14.487101  1.516682

In order to inspect the fit result, you can use the following function to plot the results:

def plot_fit_results(df, function, fit_parameters):
    NUM_POINTS = 50
    t = np.linspace(df.index.values.min(), df.index.values.max(), NUM_POINTS)
    df.plot(style='.')
    for idx, column in enumerate(df.columns):
        plt.plot(t,
                 function(t, *fit_parameters.loc[column]),
                 color='C{}'.format(idx))
    plt.show()

plot_fit_results(df, sigmoid, fit_parameters)

Result: Output Graph

This answer is also available as an interactive jupyter notebook here.

Upvotes: 0

PlagTag
PlagTag

Reputation: 6429

I think the issue is that the apply of your fitting function returns an array of dim 3x3 (the 3 fitparameters as returned by conner). But expected is something in the shape of 20x3 as your df.

So you have to re-apply your fitfunction on these parameters to get your fitted y-values.

def fitdata(dataseries):
    # fit the data
    fitParams, fitCovariances=curve_fit(sigmoid, dataseries.index, dataseries)
    # we have to re-apply a function to the coeffs. so that we get fittet 
    # data in shape of the df again.
    y_fit = sigmoid(dataseries, fitParams[0], fitParams[1], fitParams[2])
    return y_fit

Have a look here for more examples

Upvotes: 1

Conner Chang
Conner Chang

Reputation: 40

Hope my solution work for you.

result = pd.DataFrame()
for i in df.columns:
    frames = [result, pd.DataFrame(fitdata(df[i]))]
    result = pd.concat(frames, axis=1)
result.columns = df.columns

           a           b           c
0   8.041970    2.366496    8.041970
1   14.487101   12.006009   14.487101
2   1.516682    0.282359    1.516682

Upvotes: 1

Related Questions