Reputation: 7644
i have a table in pandas df
master_id pidx pidy flag count
xxx a b A 10
xxx a c A 20
xxx a d A 30
xxx b d A 40
xxx a c C 50
xxx a c C 60
xxx x y C 70
xxx x y C 80
i want to do a grouping on multiple columns and summing the count irrespective of flag.
i.e
xxx a c A 20
xxx a c C 50
xxx a c C 60
final output should be
xxx a c A 130
final table should be
master_id pidx pidy flag count
xxx a b A 10
xxx a c A 130
xxx a d A 30
xxx b d A 40
xxx x y C 150
Upvotes: 1
Views: 2055
Reputation: 862661
I think you need groupby
with agg
- column flag
is aggregate by first
and column count
by sum
:
df = df.groupby(['pidx','pidy']).agg({'flag':'first', 'count':'sum'}).reset_index()
print (df)
pidx pidy count flag
0 a b 10 A
1 a c 130 A
2 a d 30 A
3 b d 40 A
4 x y 150 C
because if use groupby
by pidx
, pidy
and flag
, output is different:
df = df.groupby(['pidx','pidy','flag'], as_index=False)['count'].sum()
print (df)
pidx pidy flag count
0 a b A 10
1 a c A 20
2 a c C 110
3 a d A 30
4 b d A 40
5 x y C 150
Upvotes: 3