Reputation: 745
What I'm trying to do is probably best illustrated by an example. Say we have the following dataframe:
ID Category Label Price
----------------------------------------------------
00001 Low Alpha 1.00
00001 Low Beta 1.50
00001 Med Chi 2.00
00001 Med Delta 2.50
00001 High Epsilon 3.00
00001 High Phi 3.50
00002 Low Alpha 1.00
00002 Low Beta 1.50
00002 Med Chi 2.50
00002 Med Delta 2.50
00002 High Epsilon 3.00
00002 High Phi 3.50
For each ID and each Label within each ID, I want to return the Label with the highest price, along with the price. For example:
ID Category Label Price
----------------------------------------------------
00001 Low Beta 1.50
00001 Med Delta 2.50
00001 High Phi 3.50
00002 Low Beta 1.50
00002 Med Delta 2.50
00002 High Phi 3.50
Initially, I thought of doing this with nested FOR statements - like iterating through a multivariate array, but I know that's not the Pandas way.
Upvotes: 0
Views: 57
Reputation: 23
Similarly, you can groupby ID, Category and Label then aggregate on Price
(df
.groupby(['ID','Category','Label'])
.agg(Price =('Price','max'))
).reset_index()
Upvotes: 1
Reputation: 153460
IIUC, you can try this:
df.loc[df.groupby(['ID','Category'], group_keys=False)['Price'].idxmax()]
Output:
ID Category Label Price
5 00001 High Phi 3.5
1 00001 Low Beta 1.5
3 00001 Med Delta 2.5
11 00002 High Phi 3.5
7 00002 Low Beta 1.5
8 00002 Med Chi 2.5
Upvotes: 3