Retrieve rows with highest value with condition

Question

I have a dataframe that looks like this:

| Id | Label | Width |
|----|-------| ------|
| 0  |   A   |   5   |
| 0  |   A   |   3   |
| 0  |   B   |   4   |
| 1  |   A   |   7   |
| 1  |   A   |   9   |

I want to write a function that takes the rows with same id and label A and filter it based on the highest width

so the after applying the function the dataframe would be:

| Id | Label | Width |
|----|-------| ------|
| 0  |   A   |   5   |
| 0  |   B   |   4   |
| 1  |   A   |   9   |

Shubham Sharma · Accepted Answer

Let us try:

m = df['Label'].eq('A')
df_a = df.loc[df[m].groupby(['Id', 'Label'])['Width'].idxmax()]

df_out = pd.concat([df[~m], df_a]).sort_index()

Details:

Create a boolean mask with .eq specifying the condition where Label equals A:

>>> m

0     True
1     True
2    False
3     True
4     True
Name: Label, dtype: bool

filter the rows using the above mask and group this dataframe on Id and Label and aggregate Width using idxmax to get the indices on max values:

>>> df[m].groupby(['Id', 'Label'])['Width'].idxmax().tolist()
[0, 4]

>>> df_a

   Id Label  Width
0   0     A      5
4   1     A      9

finally concat the above dataframe with the dataframe containing labels other that A and sort the index to maintain the order:

>>> df_out

   Id Label  Width
0   0     A      5
2   0     B      4
4   1     A      9

Retrieve rows with highest value with condition

Answers (2)

Related Questions