Reputation: 2681
I'm trying to follow an example of groupby from the documentation here. As per the example, I first create a data frame:
df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
Now, let's group by the column labeled "A" and sum the other two by its values:
df.groupby('A').sum()
This does the reasonable thing, grouping by "A" and producing:
B C
A
a 3 10
b 3 5
Now, let's try the same thing, but explicitly define the sum() function:
df.groupby('A', group_keys=False).apply(lambda x: np.sum(x))
This, for some inexplicable reason, decides to apply the function also to the entries of the "A" column. And of course, other numeric functions (like square) throw errors since they are applied on the strings. In fact, it causes the examples provided in the link above to not work.
A B C
A
a aa 3 10
b b 3 5
I tried python 2.7 and 3.6 with the same results. How can I make it do the intelligent thing and not apply the function to the column I am grouping by?
Upvotes: 1
Views: 2653
Reputation: 7994
You can also specify the columns you want to select.
df.groupby('A')["B", "C"].apply(lambda x: np.sum(x))
B C
A
a 3 10
b 3 5
Upvotes: 1
Reputation: 214957
There's probably no intelligent way for groupby.apply
to do that other than drop the group variable in apply
:
df.groupby('A').apply(lambda g: g.drop('A', 1).sum())
# B C
#A
#a 3 10
#b 3 5
Upvotes: 2