Reputation: 37
I need to compute product of all values within rolling windows for pandas series, ignore nan.
I am using pandas.Series.rolling.apply as current approach, but the speed is rather slow compare to built-in functions, I am working on huge dataframes therefore speed is my concern.
as a demonstration:
import pandas as pd
a = pd.Series(range(100))
%timeit -n100 a.rolling(5).apply(np.nanprod,raw=True)
5.58 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n100 a.rolling(5).mean()
236 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So the apply()
is a lot slower compare to built-in mean
function
1 is there a way to speed up the apply process
2 or is there a built-in product function for rolling window(ignore nan if possible)? Cant find it in docs
Upvotes: 1
Views: 141
Reputation: 30971
The recipe to your problem is as_strided Numpy function.
To use it, define the following function:
def roll_win(a, win):
shape = a.shape[:-1] + (a.shape[-1] - win + 1, win)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
Then call np.nanprod on the result of this function:
np.nanprod(roll_win(a.values, 5), axis=1)
The difference is that the result is a Numpy 1-D array, without 4 initial NaN values, but the speed should be significantly better.
Upvotes: 1