Reputation: 415
I have a following data frame, where I want to calculate the average of bu
and bl
layers by date
and create new row bu-bl_avg
date layer value
12-05-2020 bu 85
13-05-2020 bu 78
15-05-2020 bu 81
16-06-2020 bu 98
12-05-2020 bl 124
13-06-2020 bl 120
15-05-2020 bl 112
16-06-2020 bl 121
18-05-2020 bk 100
19-05-2020 bk 105
Result should look like this:
12-05-2020 bu-bl_avg 104.5
13-05-2020 bu-bl_avg 99
15-05-2020 bu-bl_avg 96.5
16-06-2020 bu-bl_avg 109.5
18-05-2020 bk 100
19-05-2020 bk 105
Upvotes: 0
Views: 33
Reputation: 862771
For 100% sure aggregate only bu,bl
values filter rows first, aggregate mean
and last append not matched rows by concat
:
mask = df.layer.isin(['bu','bl'])
df1 = (df[mask].assign(layer = 'bu-bl_avg')
.groupby(['date','layer'], as_index=False)['value']
.mean())
df = pd.concat([df1, df[~mask]])
print (df)
date layer value
0 12-05-2020 bu-bl_avg 104.5
1 13-05-2020 bu-bl_avg 78.0
2 13-06-2020 bu-bl_avg 120.0
3 15-05-2020 bu-bl_avg 96.5
4 16-06-2020 bu-bl_avg 109.5
8 18-05-2020 bk 100.0
9 19-05-2020 bk 105.0
If possible aggregate all rows after replace bu, bl
values (in real data should be aggregated also another rows!):
df.layer = df.layer.replace(['bu','bl'], 'bu-bl_avg')
df2 = (df.groupby(['date','layer'], as_index=False)['value']
.mean())
print (df2)
date layer value
0 12-05-2020 bu-bl_avg 104.5
1 13-05-2020 bu-bl_avg 78.0
2 13-06-2020 bu-bl_avg 120.0
3 15-05-2020 bu-bl_avg 96.5
4 16-06-2020 bu-bl_avg 109.5
5 18-05-2020 bk 100.0
6 19-05-2020 bk 105.0
Upvotes: 2