Reputation: 37
I have a dataframe that looks like this:
| Id | Label | Width |
|----|-------| ------|
| 0 | A | 5 |
| 0 | A | 3 |
| 0 | B | 4 |
| 1 | A | 7 |
| 1 | A | 9 |
I want to write a function that takes the rows with same id and label A and filter it based on the highest width
so the after applying the function the dataframe would be:
| Id | Label | Width |
|----|-------| ------|
| 0 | A | 5 |
| 0 | B | 4 |
| 1 | A | 9 |
Upvotes: 2
Views: 70
Reputation: 71707
Let us try:
m = df['Label'].eq('A')
df_a = df.loc[df[m].groupby(['Id', 'Label'])['Width'].idxmax()]
df_out = pd.concat([df[~m], df_a]).sort_index()
Details:
Create a boolean mask with .eq
specifying the condition where Label
equals A
:
>>> m
0 True
1 True
2 False
3 True
4 True
Name: Label, dtype: bool
filter the rows using the above mask and group this dataframe on Id
and Label
and aggregate Width
using idxmax
to get the indices on max values:
>>> df[m].groupby(['Id', 'Label'])['Width'].idxmax().tolist()
[0, 4]
>>> df_a
Id Label Width
0 0 A 5
4 1 A 9
finally concat
the above dataframe with the dataframe containing labels other that A
and sort
the index to maintain the order:
>>> df_out
Id Label Width
0 0 A 5
2 0 B 4
4 1 A 9
Upvotes: 2
Reputation: 13841
You can use groupby
to get your desired outcome:
df.groupby(['Id','Label'],as_index=False)['Width'].max()
Id Label Width
0 0 A 5
1 0 B 4
2 1 A 9
And combine it with set_index()
and loc
, to get more specific results:
label = 'A'
df.groupby(['Id','Label'],as_index=False)['Width'].max().set_index('Label').loc[label]
Id Width
Label
A 0 5
A 1 9
Upvotes: 1