edn
edn

Reputation: 2183

How to calculate slope of Pandas dataframe column based on previous N rows

I have the following example dataframe:

import pandas as pd

d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}

df = pd.DataFrame(data=d)
print(df)

Output:

       col1
0      2
1      5
2      6
3      5
4      4
5      6
6      7
7      8
8      9
9      7
10     5

I need to calculate the slope of the previous N rows from col1 and save the slope value in a separate column (call it slope). The desired output may look like the following: (Given slope values below are just random numbers for the sake of example.)

       col1  slope
0      2
1      5
2      6
3      5
4      4     3
5      6     4
6      7     5
7      8     2
8      9     4
9      7     6
10     5     5

So, in the row with the index number 4, the slope is 3 and it is the slope of [2, 5, 6, 5, 4].

Is there an elegant way of doing it without using for loop?


ADDENDUM:

Based on the accepted answer below, in case you get the following error:

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

It may be so that the index of your dataframe is maybe not numerical. The following modification makes it work then:

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(range(5), s.values)[0])

Upvotes: 2

Views: 3366

Answers (2)

mozway
mozway

Reputation: 260455

You can use rolling+apply and scipy.stats.linregress:

from scipy.stats import linregress

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])

print(df)

output:

    col1  slope
0      2    NaN
1      5    NaN
2      6    NaN
3      5    NaN
4      4    0.4
5      6    0.0
6      7    0.3
7      8    0.9
8      9    1.2
9      7    0.4
10     5   -0.5

Upvotes: 5

BENY
BENY

Reputation: 323226

Let us do with numpy

def slope_numpy(x,y):
    fit = np.polyfit(x, y, 1)
    return np.poly1d(fit)[0]
df.col1.rolling(5).apply(lambda x : slope_numpy(range(5),x))
0     NaN
1     NaN
2     NaN
3     NaN
4     3.6
5     5.2
6     5.0
7     4.2
8     4.4
9     6.6
10    8.2
Name: col1, dtype: float64

Upvotes: 3

Related Questions