Reputation: 1434
The following is a minimal example of my data:
Id Session Flag toSum
0 1 1 1 2
1 1 2 0 4
2 1 3 0 5
3 1 4 1 6
4 1 5 1 3
5 1 6 0 0
6 1 7 1 1
7 2 1 0 4
8 2 2 1 6
9 2 3 0 2
10 3 1 1 4
11 4 1 1 2
12 4 2 0 1
13 4 3 0 5
14 4 4 1 10
Can be reproduced with:
df = pd.DataFrame({
'Id': [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 4],
'Session':[1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 1, 2, 3, 4],
'Flag': [1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1],
'toSum': [2, 4, 5, 6, 3, 0, 1, 4, 6, 2, 4, 2, 1, 5, 10]
})
I want to group by Id
and aggregate the Flag
column to percentage flagged per Id and the toSum
column to sum per Id.
I.e., to produce:
Id Flag % toSum
0 1 57.14 21
1 2 33.33 12
2 3 100.0 4
3 4 50.0 18
How can this be achieved?
Upvotes: 3
Views: 298
Reputation: 863116
First aggregate mean
and sum
and then if necessary change format of column:
#trick for pass column name with space to agg
df = df.groupby('Id', as_index=False).agg(**{'Flag %':('Flag', 'mean'),
'toSum': ('toSum', 'sum')})
df['Flag %'] = df['Flag %'].mul(100).round(2)
print (df)
Id Flag % toSum
0 1 57.14 21
1 2 33.33 12
2 3 100.00 4
3 4 50.00 18
Or:
df = df.groupby('Id', as_index=False).agg({'Flag': 'mean', 'toSum': sum})
df['Flag'] = df['Flag'].map("{:.2%}".format)
print (df)
Id Flag toSum
0 1 57.14% 21
1 2 33.33% 12
2 3 100.00% 4
3 4 50.00% 18
Upvotes: 3
Reputation: 71600
Use agg
with division:
>>> df.groupby('Id', as_index=False).agg({'Flag': lambda x: round(x.eq(1).sum() / len(x) * 100, 2), 'toSum': sum})
Id Flag toSum
0 1 57.14 21
1 2 33.33 12
2 3 100.00 4
3 4 50.00 18
>>>
With percentage sign:
>>> df.groupby('Id', as_index=False).agg({'Flag': lambda x: f'{round(x.eq(1).sum() / len(x) * 100, 2)}%', 'toSum': sum})
Id Flag toSum
0 1 57.14% 21
1 2 33.33% 12
2 3 100.0% 4
3 4 50.0% 18
>>>
Upvotes: 0