Reputation: 101
I have a Python dataframe with over 50 columns that looks like this:
x1 y1 x2 y2 ... x25 y25
1.8 21.3 1.6 21.8 ... 1.9 21.7
2.6 25.4 2.7 26.3 ... 2.8 27.8
3.5 30.4 3.6 32.1 ... 3.3 33.6
I want to use polyfit to find the slope of each pair of (x,y). That would mean that slope1 = np.polyfit(x1, y1, 1)[0], ..., slope25 = np.polyfit(x25, y25, 1)[0], and so on.
I am having a hard time figuring out how to proceed. Any help would be greatly appreciated. Thank you.
Upvotes: 1
Views: 594
Reputation: 862581
You can select pair and unpair columns and pass to np.polyfit
(is necessary all column sorted and all pairs x, y
) and count ouput in list comprehension:
out = [np.polyfit(df[x], df[y], 1)[0] for x, y in zip(df.columns[::2], df.columns[1::2])]
print (out)
[5.357142857142858, 5.1112956810631225, 8.294701986754967]
Last pass to DataFrame if necessary:
df1 = pd.DataFrame({'no': df.columns.str.extract('(\d+)', expand=False).drop_duplicates(),
'slope': out})
print (df1)
no slope
0 1 5.357143
1 2 5.111296
2 25 8.294702
Or create MultiIndex
by split by x, y
or digits and then in groupby
use custom function:
df.columns = pd.MultiIndex.from_frame(df.columns.str.extract('([xy])(\d+)'))
def f(x):
x = x.droplevel(1, axis=1)
return np.polyfit(x.x, x.y, 1)[0]
df = df.groupby(axis=1, level=1).apply(f).rename_axis('no').reset_index(name='slope')
print (df)
no slope
0 1 5.357143
1 2 5.111296
2 25 8.294702
Upvotes: 1