Rolling weighted mean in pandas using date range

Question

I want to calculate the rolling weighted mean of a time series and the average to be calculated over a specific time interval. For example, this calculated the rolling mean with a 90-day window (not weighted):

import numpy as np
import pandas as pd

data = np.random.randint(0, 1000, (1000, 10))
index = pd.date_range("20190101", periods=1000, freq="18H")

df = pd.DataFrame(index=index, data=data)

df = df.rolling("90D").mean()

However, when I apply a weighting function (line below) I get an error: "ValueError: Invalid window 90D"

df = df.rolling("90D", win_type="gaussian").mean(std=60)

On the other hand, the weighted average works if I make the window an integer instead of an offset:

df = df.rolling(90, win_type="gaussian").mean(std=60)

Using an integer does not work for my application since the observations are not evenly spaced in time.

Two questions:

can I do a weighted rolling mean with an offset (e.g. "90D" or "3M"?
If I can do a weighted rolling mean with an offset, then what does std refer to when I specify window="90D" and win_type="gaussian"; does it mean the std is 60D?

Daniel Fonnegra Garc&#237;a · Accepted Answer

Okey, I discoveret that its not implemented yet in pandas.

Look here: https://github.com/pandas-dev/pandas/blob/v0.25.0/pandas/core/window.py

If you follow line 2844 you see that when win_type is not None a Window object is returned:

if win_type is not None:
    return Window(obj, win_type=win_type, **kwds)

Then check the validate method of the window object at line 630, it only allows integer or list-like windows

I think this is because pandas uses scipy.signal library which receives an array, so it cannot take into account the distribution of your data over time.

You could implement your own weighting function and use apply but its performance won't be too good.

Rolling weighted mean in pandas using date range

Answers (2)

Related Questions