Reputation: 896
I have a large dataframe which has a similar pattern as below:
X Y Z
0 a p 2
1 a q 5
2 a r 6
3 a s 3
4 b w 10
5 b z 20
6 b y 9
7 b x 20
And can be constructed as:
df = {
'X': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],
'Y': ['p', 'q', 'r', 's', 'w', 'x', 'y', 'z'],
'Z': [2, 5, 6, 3, 10, 20, 9, 5]
}
Now I want to group this dataframe by the first column i.e., X
and take max
from the Z
column and its corresponding value from Y
. And if there are two max values in Z
, then I would like to take alphabetically first value from Y
.
So my expected result would look like:
X Y Z
a r 6
b x 20
I have tried groupby('X', as_index=False).agg({'Z': 'max', 'Y': 'first'})
but this selects max from Z
and first from Y
both at the same time.
Additionally I know there is a pd.series.groupby.nlargest(1)
approach, but this would take a lot of time for my dataset.
Any suggestions on how could I proceed would be appreciated.
Thanks in advance:)
Upvotes: 3
Views: 96
Reputation: 71707
Let us try sort_values
+ drop_duplicates
:
df.sort_values(['X', 'Z', 'Y'], ascending=[True, False, True]).drop_duplicates('X')
X Y Z
2 a r 6
5 b x 20
Upvotes: 5