Sher
Sher

Reputation: 415

Calculating the average of some rows by date and create a new row in Python pandas

I have a following data frame, where I want to calculate the average of bu and bl layers by date and create new row bu-bl_avg

date            layer       value       

12-05-2020      bu      85      
13-05-2020      bu      78
15-05-2020      bu      81      
16-06-2020      bu      98
12-05-2020      bl      124     
13-06-2020      bl      120
15-05-2020      bl      112     
16-06-2020      bl      121
18-05-2020      bk      100
19-05-2020      bk      105

Result should look like this:

12-05-2020      bu-bl_avg   104.5
13-05-2020      bu-bl_avg   99
15-05-2020      bu-bl_avg   96.5
16-06-2020      bu-bl_avg   109.5
18-05-2020      bk          100
19-05-2020      bk          105

Upvotes: 0

Views: 33

Answers (1)

jezrael
jezrael

Reputation: 862771

For 100% sure aggregate only bu,bl values filter rows first, aggregate mean and last append not matched rows by concat:

mask = df.layer.isin(['bu','bl'])

df1 = (df[mask].assign(layer = 'bu-bl_avg')
               .groupby(['date','layer'], as_index=False)['value']
               .mean())


df = pd.concat([df1, df[~mask]])
print (df)
         date      layer  value
0  12-05-2020  bu-bl_avg  104.5
1  13-05-2020  bu-bl_avg   78.0
2  13-06-2020  bu-bl_avg  120.0
3  15-05-2020  bu-bl_avg   96.5
4  16-06-2020  bu-bl_avg  109.5
8  18-05-2020         bk  100.0
9  19-05-2020         bk  105.0

If possible aggregate all rows after replace bu, bl values (in real data should be aggregated also another rows!):

df.layer = df.layer.replace(['bu','bl'], 'bu-bl_avg')

df2 = (df.groupby(['date','layer'], as_index=False)['value']
               .mean())
print (df2)

         date      layer  value
0  12-05-2020  bu-bl_avg  104.5
1  13-05-2020  bu-bl_avg   78.0
2  13-06-2020  bu-bl_avg  120.0
3  15-05-2020  bu-bl_avg   96.5
4  16-06-2020  bu-bl_avg  109.5
5  18-05-2020         bk  100.0
6  19-05-2020         bk  105.0

Upvotes: 2

Related Questions