Shubham R
Shubham R

Reputation: 7644

Grouping Multiple columns and sum of count in pandas df

i have a table in pandas df

 master_id    pidx   pidy   flag   count
    xxx        a      b       A      10
    xxx        a      c       A      20
    xxx        a      d       A      30
    xxx        b      d       A      40
    xxx        a      c       C      50
    xxx        a      c       C      60
    xxx        x      y       C      70
    xxx        x      y       C      80

i want to do a grouping on multiple columns and summing the count irrespective of flag.

i.e

 xxx  a    c   A   20
 xxx  a    c   C   50
 xxx  a    c   C   60

final output should be

 xxx  a   c   A  130

final table should be

 master_id   pidx   pidy   flag   count
    xxx        a      b       A      10
    xxx        a      c       A      130
    xxx        a      d       A      30
    xxx        b      d       A      40
    xxx        x      y       C      150

Upvotes: 1

Views: 2055

Answers (1)

jezrael
jezrael

Reputation: 862661

I think you need groupby with agg - column flag is aggregate by first and column count by sum:

df = df.groupby(['pidx','pidy']).agg({'flag':'first', 'count':'sum'}).reset_index()
print (df)
  pidx pidy  count flag
0    a    b     10    A
1    a    c    130    A
2    a    d     30    A
3    b    d     40    A
4    x    y    150    C

because if use groupby by pidx, pidy and flag, output is different:

df = df.groupby(['pidx','pidy','flag'], as_index=False)['count'].sum()
print (df)
  pidx pidy flag  count
0    a    b    A     10
1    a    c    A     20
2    a    c    C    110
3    a    d    A     30
4    b    d    A     40
5    x    y    C    150

Upvotes: 3

Related Questions