Reputation: 1247
Suppose I have a data frame with 3 columns: A, B, C
. I want to group by column A
, and find the row (for each unique A) with the maximum entry in C
, so that I can store that row.A, row.B, row.C
into a dictionary elsewhere.
What's the best way to do this without using iterrows
?
Upvotes: 0
Views: 48
Reputation: 114791
Here's another method. If df
is the DataFrame, you can write df.groupby('A').apply(lambda d: d.ix[d['C'].argmax()])
.
For example,
In [96]: df
Out[96]:
A B C
0 1 0 3
1 3 0 4
2 0 4 5
3 2 4 0
4 3 1 1
5 1 6 2
6 3 6 0
7 4 0 1
8 2 3 4
9 0 5 0
10 7 6 5
11 3 1 2
In [97]: g = df.groupby('A').apply(lambda d: d['C'].argmax())
In [98]: g
Out[98]:
A
0 2
1 0
2 8
3 1
4 7
7 10
dtype: int64
In [99]: df.ix[g.values]
Out[99]:
A B C
2 0 4 5
0 1 0 3
8 2 3 4
1 3 0 4
7 4 0 1
10 7 6 5
Upvotes: 1
Reputation: 249123
# generate sample data
import pandas as pd
df = pd.DataFrame(np.random.randint(0,10,(10,3)))
df.columns = ['A','B','C']
# sort by C, group by A, take last row of each group
df.sort('C').groupby('A').nth(-1)
Upvotes: 3