Reputation: 4070
Consider I've a dataframe of 10 rows having two columns A and B as following :
A B
0 21 6
1 87 0
2 87 0
3 25 0
4 25 0
5 14 0
6 79 0
7 70 0
8 54 0
9 35 0
In excel I can calculate the rolling
mean
like this excluding the first row:
How can I do this in pandas?
Here is what I've tried:
import pandas as pd
df = pd.read_clipboard() #copying the dataframe given above and calling read_clipboard will get the df populated
for i in range(1, len(df)):
df.loc[i, 'B'] = df[['A', 'B']].loc[i-1].mean()
This gives me the desired result matching excel. But is there a better pandas way to do it? I've tried using expanding
and rolling
did not produce desired result.
Upvotes: 4
Views: 2363
Reputation: 2369
You have an exponentially weighted moving average, rather than a simple moving average. That's why pd.DataFrame.rolling
didn't work. You might be looking for pd.DataFrame.ewm
instead.
Starting from
df
Out[399]:
A B
0 21 6
1 87 0
2 87 0
3 25 0
4 25 0
5 14 0
6 79 0
7 70 0
8 54 0
9 35 0
df['B'] = df["A"].shift().fillna(df["B"]).ewm(com=1, adjust=False).mean()
df
Out[401]:
A B
0 21 6.000000
1 87 13.500000
2 87 50.250000
3 25 68.625000
4 25 46.812500
5 14 35.906250
6 79 24.953125
7 70 51.976562
8 54 60.988281
9 35 57.494141
Even on just ten rows, doing it this way speeds up the code by about a factor of 10 with %timeit
(959 microseconds from 10.3ms). On 100 rows, this becomes a factor of 100 (1.1ms vs 110ms).
Upvotes: 5