Reputation: 2183
I have the following example dataframe:
import pandas as pd
d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}
df = pd.DataFrame(data=d)
print(df)
Output:
col1
0 2
1 5
2 6
3 5
4 4
5 6
6 7
7 8
8 9
9 7
10 5
I need to calculate the slope of the previous N rows from col1 and save the slope value in a separate column (call it slope). The desired output may look like the following: (Given slope values below are just random numbers for the sake of example.)
col1 slope
0 2
1 5
2 6
3 5
4 4 3
5 6 4
6 7 5
7 8 2
8 9 4
9 7 6
10 5 5
So, in the row with the index number 4, the slope is 3 and it is the slope of [2, 5, 6, 5, 4].
Is there an elegant way of doing it without using for loop?
ADDENDUM:
Based on the accepted answer below, in case you get the following error:
TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
It may be so that the index of your dataframe is maybe not numerical. The following modification makes it work then:
df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(range(5), s.values)[0])
Upvotes: 2
Views: 3366
Reputation: 260455
You can use rolling
+apply
and scipy.stats.linregress
:
from scipy.stats import linregress
df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])
print(df)
output:
col1 slope
0 2 NaN
1 5 NaN
2 6 NaN
3 5 NaN
4 4 0.4
5 6 0.0
6 7 0.3
7 8 0.9
8 9 1.2
9 7 0.4
10 5 -0.5
Upvotes: 5
Reputation: 323226
Let us do with numpy
def slope_numpy(x,y):
fit = np.polyfit(x, y, 1)
return np.poly1d(fit)[0]
df.col1.rolling(5).apply(lambda x : slope_numpy(range(5),x))
0 NaN
1 NaN
2 NaN
3 NaN
4 3.6
5 5.2
6 5.0
7 4.2
8 4.4
9 6.6
10 8.2
Name: col1, dtype: float64
Upvotes: 3