Dipie
Dipie

Reputation: 37

Retrieve rows with highest value with condition

I have a dataframe that looks like this:

| Id | Label | Width |
|----|-------| ------|
| 0  |   A   |   5   |
| 0  |   A   |   3   |
| 0  |   B   |   4   |
| 1  |   A   |   7   |
| 1  |   A   |   9   |

I want to write a function that takes the rows with same id and label A and filter it based on the highest width

so the after applying the function the dataframe would be:

| Id | Label | Width |
|----|-------| ------|
| 0  |   A   |   5   |
| 0  |   B   |   4   |
| 1  |   A   |   9   |

Upvotes: 2

Views: 70

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71707

Let us try:

m = df['Label'].eq('A')
df_a = df.loc[df[m].groupby(['Id', 'Label'])['Width'].idxmax()]

df_out = pd.concat([df[~m], df_a]).sort_index()

Details:

Create a boolean mask with .eq specifying the condition where Label equals A:

>>> m

0     True
1     True
2    False
3     True
4     True
Name: Label, dtype: bool

filter the rows using the above mask and group this dataframe on Id and Label and aggregate Width using idxmax to get the indices on max values:

>>> df[m].groupby(['Id', 'Label'])['Width'].idxmax().tolist()
[0, 4]

>>> df_a

   Id Label  Width
0   0     A      5
4   1     A      9

finally concat the above dataframe with the dataframe containing labels other that A and sort the index to maintain the order:

>>> df_out

   Id Label  Width
0   0     A      5
2   0     B      4
4   1     A      9

Upvotes: 2

sophocles
sophocles

Reputation: 13841

You can use groupby to get your desired outcome:

df.groupby(['Id','Label'],as_index=False)['Width'].max()

   Id Label  Width
0   0     A      5
1   0     B      4
2   1     A      9

And combine it with set_index() and loc, to get more specific results:

label = 'A'
df.groupby(['Id','Label'],as_index=False)['Width'].max().set_index('Label').loc[label]

       Id  Width
Label           
A       0      5
A       1      9

Upvotes: 1

Related Questions