Reputation: 742
I have a dataframe (df)
a b c
1 2 20
1 2 15
2 4 30
3 2 20
3 2 15
and I want to recognize only max values from column c
I tried
a = df.loc[df.groupby('b')['c'].idxmax()]
but it group by removes duplicates so I get
a b c
1 2 20
2 4 30
it removes rows 3 because they are the same was rows 1.
Is it any way to write the code to not remove duplicates?
Upvotes: 1
Views: 112
Reputation: 862691
I think you need:
df = df[df['c'] == df.groupby('b')['c'].transform('max')]
print (df)
a b c
0 1 2 20
2 2 4 30
3 3 2 20
Difference in changed data:
print (df)
a b c
0 1 2 30
1 1 2 30
2 1 2 15
3 2 4 30
4 3 2 20
5 3 2 15
#only 1 max rows per groups a and b
a = df.loc[df.groupby(['a', 'b'])['c'].idxmax()]
print (a)
a b c
0 1 2 30
3 2 4 30
4 3 2 20
#all max rows per groups b
df1 = df[df['c'] == df.groupby('b')['c'].transform('max')]
print (df1)
a b c
0 1 2 30
1 1 2 30
3 2 4 30
#all max rows per groups a and b
df2 = df[df['c'] == df.groupby(['a', 'b'])['c'].transform('max')]
print (df2)
a b c
0 1 2 30
1 1 2 30
3 2 4 30
4 3 2 20
Upvotes: 2
Reputation: 25997
Just also take column a
into account when you do the groupby
:
a = df.loc[df.groupby(['a', 'b'])['c'].idxmax()]
a b c
0 1 2 20
2 2 4 30
3 3 2 20
Upvotes: 2