kk415kk
kk415kk

Reputation: 1247

Finding the maximum entry based on another column in a data frame

Suppose I have a data frame with 3 columns: A, B, C. I want to group by column A, and find the row (for each unique A) with the maximum entry in C, so that I can store that row.A, row.B, row.C into a dictionary elsewhere.

What's the best way to do this without using iterrows?

Upvotes: 0

Views: 48

Answers (2)

Warren Weckesser
Warren Weckesser

Reputation: 114791

Here's another method. If df is the DataFrame, you can write df.groupby('A').apply(lambda d: d.ix[d['C'].argmax()]). For example,

In [96]: df
Out[96]: 
    A  B  C
0   1  0  3
1   3  0  4
2   0  4  5
3   2  4  0
4   3  1  1
5   1  6  2
6   3  6  0
7   4  0  1
8   2  3  4
9   0  5  0
10  7  6  5
11  3  1  2

In [97]: g = df.groupby('A').apply(lambda d: d['C'].argmax())

In [98]: g
Out[98]: 
A
0     2
1     0
2     8
3     1
4     7
7    10
dtype: int64

In [99]: df.ix[g.values]
Out[99]: 
    A  B  C
2   0  4  5
0   1  0  3
8   2  3  4
1   3  0  4
7   4  0  1
10  7  6  5

Upvotes: 1

John Zwinck
John Zwinck

Reputation: 249123

# generate sample data
import pandas as pd
df = pd.DataFrame(np.random.randint(0,10,(10,3)))
df.columns = ['A','B','C']

# sort by C, group by A, take last row of each group
df.sort('C').groupby('A').nth(-1)

Upvotes: 3

Related Questions