Python Dataframe - Use polyfit to find the slope of every 2 set of columns

Question

I have a Python dataframe with over 50 columns that looks like this:

 x1   y1    x2   y2  ...  x25  y25
1.8  21.3  1.6  21.8 ...  1.9  21.7
2.6  25.4  2.7  26.3 ...  2.8  27.8
3.5  30.4  3.6  32.1 ...  3.3  33.6

I want to use polyfit to find the slope of each pair of (x,y). That would mean that slope1 = np.polyfit(x1, y1, 1)[0], ..., slope25 = np.polyfit(x25, y25, 1)[0], and so on.

I am having a hard time figuring out how to proceed. Any help would be greatly appreciated. Thank you.

jezrael · Accepted Answer

You can select pair and unpair columns and pass to np.polyfit (is necessary all column sorted and all pairs x, y) and count ouput in list comprehension:

out = [np.polyfit(df[x], df[y], 1)[0] for x, y in zip(df.columns[::2], df.columns[1::2])]

print (out)
[5.357142857142858, 5.1112956810631225, 8.294701986754967]

Last pass to DataFrame if necessary:

df1 = pd.DataFrame({'no': df.columns.str.extract('(\d+)', expand=False).drop_duplicates(),
                    'slope': out})
print (df1)
   no     slope
0   1  5.357143
1   2  5.111296
2  25  8.294702

Or create MultiIndex by split by x, y or digits and then in groupby use custom function:

df.columns = pd.MultiIndex.from_frame(df.columns.str.extract('([xy])(\d+)'))


def f(x):
    x = x.droplevel(1, axis=1)
    return np.polyfit(x.x, x.y, 1)[0]

df = df.groupby(axis=1, level=1).apply(f).rename_axis('no').reset_index(name='slope')
print (df)
   no     slope
0   1  5.357143
1   2  5.111296
2  25  8.294702

Python Dataframe - Use polyfit to find the slope of every 2 set of columns

Answers (1)

Related Questions