Reputation: 1057
I am trying to do an exponentially-weighted moving average, where decay is specified in terms of halflife on a datetime column, using pandas ewm function. Pandas ewm function works similar to the pandas expand function in that it rolls over the whole dataframe. In my case however, I need to specify a fixed time window or offset over which the ewm function is applied. In other words, an ewma with a cutoff or "max_periods" parameter.
My solution to this is as follows:
df = pd.DataFrame({'a': np.random.randint(5, size=24),
'b': ["S", "A"] * 12,
'c': pd.date_range(start='1/1/2018', end='12/12/2018', freq='15D')})
df.groupby('b').rolling('60d', on='c')['a'].apply(lambda x: x.ewm(halflife='15d', times=x.index).mean().tail(1))
My solution is very inefficient. Looking for something faster.
Upvotes: 3
Views: 1435
Reputation: 1679
I came up with the following, but it's only slightly faster when I timed it. I'd be curious to know if there's a better solution.
WEIGHTS = [pow(2, i) for i in range(10)] # at least as many as the size of your window
def weighted_avg(df):
weights = WEIGHTS[0:len(df)]
return df.mul(weights).sum() / sum(weights)
df.groupby('b').rolling('60d', on='c')['a'].apply(weighted_avg)
Upvotes: 1