How to calculate slope of Pandas dataframe column based on previous N rows

Question

I have the following example dataframe:

import pandas as pd

d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}

df = pd.DataFrame(data=d)
print(df)

Output:

I need to calculate the slope of the previous N rows from col1 and save the slope value in a separate column (call it slope). The desired output may look like the following: (Given slope values below are just random numbers for the sake of example.)

       col1  slope
0      2
1      5
2      6
3      5
4      4     3
5      6     4
6      7     5
7      8     2
8      9     4
9      7     6
10     5     5

So, in the row with the index number 4, the slope is 3 and it is the slope of [2, 5, 6, 5, 4].

Is there an elegant way of doing it without using for loop?

ADDENDUM:

Based on the accepted answer below, in case you get the following error:

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

It may be so that the index of your dataframe is maybe not numerical. The following modification makes it work then:

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(range(5), s.values)[0])

mozway · Accepted Answer

You can use rolling+apply and scipy.stats.linregress:

from scipy.stats import linregress

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])

print(df)

output:

    col1  slope
0      2    NaN
1      5    NaN
2      6    NaN
3      5    NaN
4      4    0.4
5      6    0.0
6      7    0.3
7      8    0.9
8      9    1.2
9      7    0.4
10     5   -0.5

How to calculate slope of Pandas dataframe column based on previous N rows

Answers (2)

Related Questions