groupby.apply apply only to a part of the columns

Question

I'm trying to do a groupby.apply on a dataframe but only apply to some of the columns.

My data looks like that:

   a  b  c  d  e
0  1  1  1  4  9
1  1  2  2  7  0
2  1  1  3  4  7
3  2  1  4  3  3
4  2  2  5  2  8
5  2  3  6  6  3
6  2  1  7  3  6
7  3  2  8  4  4
8  3  3  9  5  2

and I would like to groupby a, b and c (group all rows where all three columns are the same), and then sum columns c and e to get:

   a  b   c  d   e
0  1  1   4  4  16
1  1  2   2  7   0
2  2  1  11  3   9
3  2  2   5  2   8
4  2  3   6  6   3
5  3  2   8  4   4
6  3  3   9  5   2

(I summed values in columns c and e in rows (0,2) and (3,6))

I tried the following:

a.groupby(['a','b','d'], as_index = False).apply(sum)

But I get:

       a  b   c  d   e
a b d                 
1 1 4  2  2   4  8  16
  2 7  1  2   2  7   0
2 1 3  4  2  11  6   9
  2 2  2  2   5  2   8
  3 6  2  3   6  6   3
3 2 4  3  2   8  4   4
  3 5  3  3   9  5   2

My problem here is that the values in columns a, b and d where summed as well, while I wanted them left as-is. How can I avoid applying the sum to the columns I'm grouping by?

BENY · Accepted Answer

Point out which columns need to be sumed, should solve the problem

df.groupby(['a','b','d'], as_index = False)['c','e'].sum()
Out[394]: 
   a  b  d   c   e
0  1  1  4   4  16
1  1  2  7   2   0
2  2  1  3  11   9
3  2  2  2   5   8
4  2  3  6   6   3
5  3  2  4   8   4
6  3  3  5   9   2

groupby.apply apply only to a part of the columns

Answers (1)

Related Questions