Reputation: 249
Can anyone help me understand why there is different behavior between the two calls to apply below? Thank you.
In [34]: df
Out[34]:
A B C
0 1 0 0
1 1 7 4
2 2 9 8
3 2 2 4
4 2 2 1
5 3 3 3
6 3 3 2
7 3 5 7
In [35]: g = df.groupby('A')
In [36]: g.apply(max)
Out[36]:
A B C
A
1 1 7 4
2 2 9 8
3 3 5 7
In [37]: g.apply(lambda x: max(x))
Out[37]:
A
1 C
2 C
3 C
dtype: object
Upvotes: 2
Views: 174
Reputation: 52286
Short answer - you probably just want
df.groupby('A').max()
Longer answer - max
is a generic python function that finds the max of any iterable. Because iterating a DataFrame
is over the columns, calling the python max just finds the "largest" column, which happens in your second case.
In the first case - pandas
has intercept logic, which turns things like g.apply(sum)
into g.sum()
.
Upvotes: 3