Yi Fang
Yi Fang

Reputation: 37

speed up pandas series.rolling.appy()

I need to compute product of all values within rolling windows for pandas series, ignore nan.

I am using pandas.Series.rolling.apply as current approach, but the speed is rather slow compare to built-in functions, I am working on huge dataframes therefore speed is my concern.

as a demonstration:

import pandas as pd
a = pd.Series(range(100))
%timeit -n100 a.rolling(5).apply(np.nanprod,raw=True)
5.58 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit -n100 a.rolling(5).mean()
236 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So the apply() is a lot slower compare to built-in mean function

1 is there a way to speed up the apply process

2 or is there a built-in product function for rolling window(ignore nan if possible)? Cant find it in docs

Upvotes: 1

Views: 141

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30971

The recipe to your problem is as_strided Numpy function.

To use it, define the following function:

def roll_win(a, win):
    shape = a.shape[:-1] + (a.shape[-1] - win + 1, win)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

Then call np.nanprod on the result of this function:

np.nanprod(roll_win(a.values, 5), axis=1)

The difference is that the result is a Numpy 1-D array, without 4 initial NaN values, but the speed should be significantly better.

Upvotes: 1

gosuto
gosuto

Reputation: 5741

Actually there is a .prod() function which ignores NA/null values by default.

Upvotes: 0

Related Questions