ChrisOram
ChrisOram

Reputation: 1434

Pandas GroupBy : % of boolean flags which are True

The following is a minimal example of my data:

   Id Session Flag toSum
0   1   1      1    2
1   1   2      0    4
2   1   3      0    5
3   1   4      1    6
4   1   5      1    3
5   1   6      0    0
6   1   7      1    1
7   2   1      0    4
8   2   2      1    6
9   2   3      0    2
10  3   1      1    4
11  4   1      1    2
12  4   2      0    1
13  4   3      0    5
14  4   4      1    10

Can be reproduced with:

df = pd.DataFrame({
    'Id': [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 4],
    'Session':[1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 1, 2, 3, 4],
    'Flag': [1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1],
    'toSum': [2, 4, 5, 6, 3, 0, 1, 4, 6, 2, 4, 2, 1, 5, 10]
})

I want to group by Id and aggregate the Flag column to percentage flagged per Id and the toSum column to sum per Id.

I.e., to produce:

    Id  Flag %  toSum
0   1   57.14   21
1   2   33.33   12
2   3   100.0   4
3   4   50.0    18

How can this be achieved?

Upvotes: 3

Views: 298

Answers (2)

jezrael
jezrael

Reputation: 863116

First aggregate mean and sum and then if necessary change format of column:

#trick for pass column name with space to agg
df = df.groupby('Id', as_index=False).agg(**{'Flag %':('Flag', 'mean'),
                                             'toSum': ('toSum', 'sum')})

df['Flag %'] = df['Flag %'].mul(100).round(2)
print (df)
   Id  Flag %  toSum
0   1   57.14     21
1   2   33.33     12
2   3  100.00      4
3   4   50.00     18

Or:

df = df.groupby('Id', as_index=False).agg({'Flag': 'mean', 'toSum': sum})

df['Flag'] = df['Flag'].map("{:.2%}".format)
print (df)
   Id     Flag  toSum
0   1   57.14%     21
1   2   33.33%     12
2   3  100.00%      4
3   4   50.00%     18

Upvotes: 3

U13-Forward
U13-Forward

Reputation: 71600

Use agg with division:

>>> df.groupby('Id', as_index=False).agg({'Flag': lambda x: round(x.eq(1).sum() / len(x) * 100, 2), 'toSum': sum})
   Id    Flag  toSum
0   1   57.14     21
1   2   33.33     12
2   3  100.00      4
3   4   50.00     18
>>> 

With percentage sign:

>>> df.groupby('Id', as_index=False).agg({'Flag': lambda x: f'{round(x.eq(1).sum() / len(x) * 100, 2)}%', 'toSum': sum})
   Id    Flag  toSum
0   1  57.14%     21
1   2  33.33%     12
2   3  100.0%      4
3   4   50.0%     18
>>> 

Upvotes: 0

Related Questions