fffrost
fffrost

Reputation: 1767

computing rolling slope on a large pandas dataframe

I have a dataframe with > 250k rows and I'd like to compute rolling regression slopes. I can do it with the following code, but it takes over a minute. Is there anything I can do to speed this up?

import pandas as pd
from datetime import datetime
from scipy.stats import linregress

# Some data
df = pd.DataFrame({'y':np.random.normal(0,1,250000)})

def compute_slope(y):
    output = linregress(list(range(len(y))), y)
    return output.slope

start = datetime.now()
df['slopes'] = df['y'].rolling(window=15).apply(compute_slope)
print(f"Duration of rolling slopes = {datetime.now() - start}")

Out[12]: Duration of rolling slopes = 0:01:06.327182

Upvotes: 2

Views: 1741

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

With np.polyfit and as_strided you can do something like this:

from numpy.lib.stride_tricks import as_strided

window = 15
ys = df.y.to_numpy()
stride = ys.strides

slopes, intercepts = np.polyfit(np.arange(window), 
                                as_strided(ys, (len(df)-window+1, window), 
                                           stride+stride).T,
                                deg=1)

Performance:

CPU times: user 148 ms, sys: 9.86 ms, total: 157 ms

Upvotes: 4

Related Questions