Reputation: 433
Imagine that you have the following dataframe: Now, what we want to compute is the INDUSTRY average historical ROE, and place this number in a new column.
Date ROE id
30-06-2013 2% 1
30-06-2013 1% 2
30-09-2013 5% 1
30-09-2013 6% 2
30-09-2013 4% 3
31-12-2013 10% 1
31-12-2013 5% 2
31-12-2013 3% 3
31-03-2014 11% 1
31-03-2014 6% 2
31-03-2014 4% 3
When computing the historical ROE we want to rely on a minimum of 4 different dates. The resulting dataframe should look the following:
Date ROE id Hist. avg. ROE
30-06-2013 2% 1 NaN
30-06-2013 1% 2 NaN
30-09-2013 5% 1 NaN
30-09-2013 6% 2 NaN
30-09-2013 4% 3 NaN
31-12-2013 10% 1 NaN
31-12-2013 5% 2 NaN
31-12-2013 3% 3 NaN
31-03-2014 11% 1 5.2%
31-03-2014 6% 2 5.2%
31-03-2014 4% 3 5.2%
Upvotes: 0
Views: 285
Reputation: 150765
IIUC, you want rolling().mean()
:
df['hist'] = (df.groupby('id', as_index=False)
.ROE.rolling(4).mean()
.reset_index(level=0, drop=True)
)
Output:
Date ROE id hist
0 30-06-2013 0.02 1 NaN
1 30-06-2013 0.01 2 NaN
2 30-09-2013 0.05 1 NaN
3 30-09-2013 0.06 2 NaN
4 30-09-2013 0.04 3 NaN
5 31-12-2013 0.10 1 NaN
6 31-12-2013 0.05 2 NaN
7 31-12-2013 0.03 3 NaN
8 31-03-2014 0.11 1 0.070
9 31-03-2014 0.06 2 0.045
10 31-03-2014 0.04 3 NaN
Update: Per comment, you may want to aggregate on date
and then rolling
on the result.
df['Date'] = pd.to_datetime(df['Date'])
new_df = (df.groupby('Date').ROE.agg(['sum','count'])
.rolling(4).sum()
.assign(hist=lambda x: x['sum']/x['count'])
)
df['hist'] = df['Date'].map(new_df['hist'])
Output:
Date ROE id hist
0 2013-06-30 0.02 1 NaN
1 2013-06-30 0.01 2 NaN
2 2013-09-30 0.05 1 NaN
3 2013-09-30 0.06 2 NaN
4 2013-09-30 0.04 3 NaN
5 2013-12-31 0.10 1 NaN
6 2013-12-31 0.05 2 NaN
7 2013-12-31 0.03 3 NaN
8 2014-03-31 0.11 1 0.051818
9 2014-03-31 0.06 2 0.051818
10 2014-03-31 0.04 3 0.051818
Upvotes: 1