SOK
SOK

Reputation: 1792

Apply a EWMA rolling window function in Pandas but avoid initial NAN values

I have the following dataframe and subsequent EWMA function:

from functools import partial

#Create DF
d = {'Name': ['Jim', 'Jim','Jim','Jim','Jim','Jim','Jim','Jim',], 'col2': [5,5,5,5,5,5,5,5]}
df1 = pd.DataFrame(data=d)


#EWMA 5
alpha = 1-np.log(2)/3 
window5 = 5
weights5 = list(reversed([(1-alpha)**n for n in range(window5)]))
ewma5 = partial(np.average, weights=weights5)

df1['Rolling5'] = df1.groupby('Name')['col2'].transform(lambda x: x.rolling(5).apply(ewma5))
df1

Which results in:

enter image description here

I have a specified a rolling window of 5 but does anyone know how can I get the EWMA to calculate in the first to 4th Row even though there arent 5 values?

EG for row 1, calculate for just row 1 (which would just be the same value) and then for row 2 it calculates the EWMA of rows 1 & 2. Also open to more efficient ways of doing this!

Thanks very much!

Upvotes: 1

Views: 1332

Answers (2)

perl
perl

Reputation: 9941

You can use ewm and set min_periods in rolling to 1:

def f(x):
    return x.ewm(alpha=1-np.log(2)/3).mean().iloc[-1]

df1['Rolling5'] = df1.groupby('Name')['col2'].transform(
    lambda x: x.rolling(5, min_periods=1).apply(f))

Comparing with the original:

df1['Rolling5_original'] = df1.groupby('Name')['col2'].transform(
    lambda x: x.rolling(5).apply(ewma5))

df1['Rolling5'] = df1.groupby('Name')['col2'].transform(
    lambda x: x.rolling(5, min_periods=1).apply(f))

df1

Output:

  Name  col2  Rolling5_original  Rolling5
0  Jim     1                NaN  1.000000
1  Jim     2                NaN  1.812315
2  Jim     3                NaN  2.736992
3  Jim     4                NaN  3.710959
4  Jim     5           4.702821  4.702821
5  Jim     6           5.702821  5.702821
6  Jim     7           6.702821  6.702821
7  Jim     8           7.702821  7.702821

Upvotes: 2

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

You're close, if you specify the min_periods=1, the windows out of rolling will start from size 1 and then expand till 5 and stay there. As for the average, we will pass the weights' corresponding parts to cover the cases it will fall short:

weights = (1-alpha) ** np.arange(5)[::-1]

df["rolling_5"] = (df.col2
                     .rolling(5, min_periods=1)
                     .apply(lambda win: np.average(win, weights=weights[-win.size:]))
)

to get

  Name  col2  rolling_5
0  Jim     5        5.0
1  Jim     5        5.0
2  Jim     5        5.0
3  Jim     5        5.0
4  Jim     5        5.0
5  Jim     5        5.0
6  Jim     5        5.0
7  Jim     5        5.0

Upvotes: 2

Related Questions