Reputation: 1767
I have a dataframe with > 250k rows and I'd like to compute rolling regression slopes. I can do it with the following code, but it takes over a minute. Is there anything I can do to speed this up?
import pandas as pd
from datetime import datetime
from scipy.stats import linregress
# Some data
df = pd.DataFrame({'y':np.random.normal(0,1,250000)})
def compute_slope(y):
output = linregress(list(range(len(y))), y)
return output.slope
start = datetime.now()
df['slopes'] = df['y'].rolling(window=15).apply(compute_slope)
print(f"Duration of rolling slopes = {datetime.now() - start}")
Out[12]: Duration of rolling slopes = 0:01:06.327182
Upvotes: 2
Views: 1741
Reputation: 150735
With np.polyfit
and as_strided
you can do something like this:
from numpy.lib.stride_tricks import as_strided
window = 15
ys = df.y.to_numpy()
stride = ys.strides
slopes, intercepts = np.polyfit(np.arange(window),
as_strided(ys, (len(df)-window+1, window),
stride+stride).T,
deg=1)
Performance:
CPU times: user 148 ms, sys: 9.86 ms, total: 157 ms
Upvotes: 4