Lynn
Lynn

Reputation: 4398

groupby two columns take highest sum (Pandas)

Looking to see which client has the highest price sum for the month of February. 2/1/2022

Data

client  box Price   date
charles AA  5,000   2/1/2022
charles AA  5,050   2/1/2022
charles AA  5,075   2/1/2022
cara    BB  25,116  2/1/2022
cara    BB  5,154   2/1/2022
lu      CC  0       2/1/2022
max     DD  10,000  3/1/2022

Desired

   client   box Price   date
   cara     BB  30,270  2/1/2022

Doing

df.groupby(['client','date']) \
.agg({'Price':'sum'}).reset_index() \

Any suggestion is helpful.

Upvotes: 1

Views: 44

Answers (4)

Paulo Costa
Paulo Costa

Reputation: 1625

idxmax will return the index of the maximum value—which you can then use to look up the row you want. Be aware that in the case that two customers are tied for the highest price, it will return only the first occurence.

Example:

(
    df
    # Fist filter to only Februrary
    .loc[lambda df_: df_["date"].dt.month == 2]
    .groupby(["client", "box"])["Price"]
    .sum()
    .reset_index()
    # Then select where price is max
    .loc[lambda df_: df_["Price"].idxmax()]
)

Upvotes: 1

Jason Baker
Jason Baker

Reputation: 3706

If box is always the same per client.

df = df.groupby(["client", "date"]).agg({"box": "first", "Price": "sum"})
df = df[df["Price"].eq(df["Price"].max())].reset_index()

If box could be different per client.

df = df.groupby(["client", "date", "box"]).agg({"Price": "sum"})
df = df[df["Price"].eq(df["Price"].max())].reset_index()

Upvotes: 1

Heelara
Heelara

Reputation: 979

Based on the OP's desired output, here is a suggestion:

gdf = df.groupby(['client','box','date']).agg({'Price':'sum'}).reset_index()
gdf.loc[gdf.Price.idxmax()]

Upvotes: 1

Soumen Ghosh
Soumen Ghosh

Reputation: 198

import pandas as pd

df = pd.DataFrame({"client": ["charles", "charles", "charles", "cara", "cara", "lu", "max"],
                    "box": ["AA", "AA", "AA", "BB", "BB", "CC", "DD"],
                    "price": [5000, 5050, 5075, 25116, 5154, 0, 10000],
                    "date": ["2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "3/1/2022"]})
# print(df)

print(df.groupby(by= ["client", "box", "date"]).price.aggregate('sum'))

Upvotes: 1

Related Questions