Speeding up rolling sum calculation in pandas groupby

Question

I want to compute rolling sums group-wise for a large number of groups and I'm having trouble doing it acceptably quickly.

Pandas has build-in methods for rolling and expanding calculations

Here's an example:

import pandas as pd
import numpy as np
obs_per_g = 20
g = 10000
obs = g * obs_per_g
k = 20
df = pd.DataFrame(
    data=np.random.normal(size=obs * k).reshape(obs, k),
    index=pd.MultiIndex.from_product(iterables=[range(g), range(obs_per_g)]),
)

To get rolling and expanding sums I can use

df.groupby(level=0).expanding().sum()
df.groupby(level=0).rolling(window=5).sum()

But this takes a long time for a very large number of groups. For expanding sums, using instead the pandas method cumsum is almost 60 times quicker (16s vs 280ms for the above example) and turns hours into minutes.

df.groupby(level=0).cumsum()

Is there a fast implementation of rolling sum in pandas, like cumsum is for expanding sums? If not, could I use numpy to accomplish this?

Mark · Accepted Answer

I had the same experience with .rolling() its nice, but only with small datasets or if the function you are applying is non standard, with sum() I would suggest using cumsum() and subtracting cumsum().shift(5)

df.groupby(level=0).cumsum() - df.groupby(level=0).cumsum().shift(5)

Speeding up rolling sum calculation in pandas groupby

Answers (2)

Related Questions