CutePoison
CutePoison

Reputation: 5355

Change name for multiple functions when calling .agg using a dict

Having a dataframe df with the columns id,price and units like so


id | price | units
---+-------+--------
1  | 10    | 1
1  | 15    | 4
1  | 13    | 3
2  | 5     | 12
2  | 1     | 20

I can apply multiple functions to specific columns like


df.groupby("id").agg({"price":["mean","max"], "units":["mean","max"])

Since we have max and mean twice, it might be nice to rename them (I am aware that the output is a multicolumn-index) to be able to tell the difference.

Normally when using .agg we can specify the name like df.groupby("id")["price"](max_price="max") but it does not seem to work the same way when parsing a dict e.g

df.groupby("id").agg({"price":[(mean_price="mean"),(max_price="max")],
 "units":[(mean_unit="mean"),(max_unit="max")]})

Upvotes: 1

Views: 173

Answers (1)

jezrael
jezrael

Reputation: 862731

Use named aggregation - there is format different - new column name with tuples for columns name with aggregation function:

df1 = (df.groupby("id").agg(mean_price= ("price","mean"),
                            max_price=("price","max"),
                            mean_unit=("units","mean"),
                            max_unit=("units","max")))
print (df1)
    mean_price  max_price  mean_unit  max_unit
id                                            
1    12.666667         15   2.666667         4
2     3.000000          5  16.000000        20

Another solution if want pass dictionary is use dict with ** for unpacking arguments:

df1 = (df.groupby("id").agg(**{'mean_price':("price","mean"),
                               'max_price':("price","max"),
                               'mean_unit':("units","mean"),
                               'max_unit':("units","max")}))

Upvotes: 2

Related Questions