Reputation: 1646
I'm trying to do a groupby.apply on a dataframe but only apply to some of the columns.
My data looks like that:
a b c d e
0 1 1 1 4 9
1 1 2 2 7 0
2 1 1 3 4 7
3 2 1 4 3 3
4 2 2 5 2 8
5 2 3 6 6 3
6 2 1 7 3 6
7 3 2 8 4 4
8 3 3 9 5 2
and I would like to groupby a
, b
and c
(group all rows where all three columns are the same), and then sum columns c
and e
to get:
a b c d e
0 1 1 4 4 16
1 1 2 2 7 0
2 2 1 11 3 9
3 2 2 5 2 8
4 2 3 6 6 3
5 3 2 8 4 4
6 3 3 9 5 2
(I summed values in columns c
and e
in rows (0,2) and (3,6))
I tried the following:
a.groupby(['a','b','d'], as_index = False).apply(sum)
But I get:
a b c d e
a b d
1 1 4 2 2 4 8 16
2 7 1 2 2 7 0
2 1 3 4 2 11 6 9
2 2 2 2 5 2 8
3 6 2 3 6 6 3
3 2 4 3 2 8 4 4
3 5 3 3 9 5 2
My problem here is that the values in columns a
, b
and d
where summed as well, while I wanted them left as-is. How can I avoid applying the sum to the columns I'm grouping by?
Upvotes: 0
Views: 70
Reputation: 323226
Point out which columns need to be sumed, should solve the problem
df.groupby(['a','b','d'], as_index = False)['c','e'].sum()
Out[394]:
a b d c e
0 1 1 4 4 16
1 1 2 7 2 0
2 2 1 3 11 9
3 2 2 2 5 8
4 2 3 6 6 3
5 3 2 4 8 4
6 3 3 5 9 2
Upvotes: 2