Rohit Pandey
Rohit Pandey

Reputation: 2681

Pandas dataframe group by doesn't remove grouped key

I'm trying to follow an example of groupby from the documentation here. As per the example, I first create a data frame:

df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})

Now, let's group by the column labeled "A" and sum the other two by its values:

df.groupby('A').sum()

This does the reasonable thing, grouping by "A" and producing:

   B   C
A
a  3  10
b  3   5

Now, let's try the same thing, but explicitly define the sum() function:

df.groupby('A', group_keys=False).apply(lambda x: np.sum(x))

This, for some inexplicable reason, decides to apply the function also to the entries of the "A" column. And of course, other numeric functions (like square) throw errors since they are applied on the strings. In fact, it causes the examples provided in the link above to not work.

    A  B   C
A
a  aa  3  10
b   b  3   5

I tried python 2.7 and 3.6 with the same results. How can I make it do the intelligent thing and not apply the function to the column I am grouping by?

Upvotes: 1

Views: 2653

Answers (2)

Tai
Tai

Reputation: 7994

You can also specify the columns you want to select.

df.groupby('A')["B", "C"].apply(lambda x: np.sum(x))

    B   C
A       
a   3   10
b   3   5

Upvotes: 1

akuiper
akuiper

Reputation: 214957

There's probably no intelligent way for groupby.apply to do that other than drop the group variable in apply:

df.groupby('A').apply(lambda g: g.drop('A', 1).sum())

#   B   C
#A
#a  3  10
#b  3   5

Upvotes: 2

Related Questions