Reputation: 5355
Having a dataframe df
with the columns id
,price
and units
like so
id | price | units
---+-------+--------
1 | 10 | 1
1 | 15 | 4
1 | 13 | 3
2 | 5 | 12
2 | 1 | 20
I can apply multiple functions to specific columns like
df.groupby("id").agg({"price":["mean","max"], "units":["mean","max"])
Since we have max
and mean
twice, it might be nice to rename them (I am aware that the output is a multicolumn-index) to be able to tell the difference.
Normally when using .agg
we can specify the name like df.groupby("id")["price"](max_price="max")
but it does not seem to work the same way when parsing a dict e.g
df.groupby("id").agg({"price":[(mean_price="mean"),(max_price="max")],
"units":[(mean_unit="mean"),(max_unit="max")]})
Upvotes: 1
Views: 173
Reputation: 862731
Use named aggregation - there is format different - new column name
with tuples for columns name
with aggregation function
:
df1 = (df.groupby("id").agg(mean_price= ("price","mean"),
max_price=("price","max"),
mean_unit=("units","mean"),
max_unit=("units","max")))
print (df1)
mean_price max_price mean_unit max_unit
id
1 12.666667 15 2.666667 4
2 3.000000 5 16.000000 20
Another solution if want pass dictionary is use dict with **
for unpacking arguments:
df1 = (df.groupby("id").agg(**{'mean_price':("price","mean"),
'max_price':("price","max"),
'mean_unit':("units","mean"),
'max_unit':("units","max")}))
Upvotes: 2