nick
nick

Reputation: 862

How to speed up pandas.rolling weighted_average or other function?

I want to calculate the weighted average for a long array.

which rolling in a fixed-size window.

For example:

In [1]: import pandas as pd

In [2]: a = pd.Series(range(int(1e8)))

In [5]: import numpy as np; w = np.array(list(range(10)));

In [6]: a.rolling(10).apply(lambda x: (x * w).sum())

as i tried, this is very slow, I read some blog, sometimes it can be speeded up by:

 a.rolling(10).apply(np.argmax, engine='numba', raw=True)

but this can only be used in build-in function, for some customed function, seems not work.

do you know how to make it standable in costing time?

Upvotes: 1

Views: 424

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Solution with np.convolve

We can convolve the weights w over the series a using the valid mode convolution operation, essentially this has the same effect as calculating the rolling weighted sum.

s = np.convolve(a, w[::-1], 'valid')
s = [np.nan] * (len(w) - 1) + list(s)

Solution with sliding_window_view

Alternatively, we can also use sliding_window_view to speed up the rolling weighted sum computation

from numpy.lib.stride_tricks import sliding_window_view

s = (sliding_window_view(a, len(w)) * w).sum(1)
s = [np.nan] * (len(w) - 1) + list(s)

Timings

a = pd.Series(range(int(1e4)))

%%timeit
s = np.convolve(a, w[::-1], 'valid')
s = [np.nan] * (len(w) - 1) + list(s)
# 1000 loops, best of 5: 626 µs per loop

%%timeit
s = (sliding_window_view(a, len(w)) * w).sum(1)
s = [np.nan] * (len(w) - 1) + list(s)    
# 1000 loops, best of 5: 1.2 ms per loop

%%timeit
s = a.rolling(10).apply(lambda x: (x * w).sum())
# 1 loop, best of 5: 3.6 s per loop

As it is evident from the performance test using np.convolve is about 5750x faster, while using sliding_window_view is around 2880x faster compared to the pandas rolling + apply method.

Upvotes: 2

Related Questions