BKS
BKS

Reputation: 2333

Calculate percentages for subgroups in pandas dataframe

I have a dataframe df:

    VID         SFID        SFReps
0   0000F0DD    000C5AF6    9
1   0000F0DD    000E701F    16
2   0000F0DD    00481C04    1
3   0000F0DD    004DCD04    1
4   0000F0DD    006CD213    1
5   0000F0DD    00889D31    9
6   0000AAAA    00F8733A    4
7   0000AAAA    00FDD591    1
8   0000AAAA    01243458    4
9   0000AAAA    01292867    16
10  0000AAAA    0131445A    9
11  0000AAAA    013CB69F    1

I want to calculate the percentage that each SFReps represents for each VID group.

So the result should be something like:

    VID         SFID        SFReps   SFPercent
0   0000F0DD    000C5AF6    9        0.24
1   0000F0DD    000E701F    16       0.43
2   0000F0DD    00481C04    1        0.03
3   0000F0DD    004DCD04    1        0.03
4   0000F0DD    006CD213    1        0.03
5   0000F0DD    00889D31    9        0.24
6   0000AAAA    00F8733A    4        0.11
7   0000AAAA    00FDD591    1        0.03
8   0000AAAA    01243458    4        0.11
9   0000AAAA    01292867    16       0.46 
10  0000AAAA    0131445A    9        0.26
11  0000AAAA    013CB69F    1        0.03

I know I can group each VID values using groupby but after that I'm stumped.

Looping through each row is an option, but I know there is a better way to do this.

Upvotes: 1

Views: 1899

Answers (1)

jezrael
jezrael

Reputation: 862661

You can divide by new Series created by transform and sum for same index as original df:

print (df.groupby('VID')['SFReps'].transform('sum'))
0     37
1     37
2     37
3     37
4     37
5     37
6     35
7     35
8     35
9     35
10    35
11    35
Name: SFReps, dtype: int64

df['SFPercent'] = df.SFReps / df.groupby('VID')['SFReps'].transform('sum')
print (df)
         VID      SFID  SFReps  SFPercent
0   0000F0DD  000C5AF6       9   0.243243
1   0000F0DD  000E701F      16   0.432432
2   0000F0DD  00481C04       1   0.027027
3   0000F0DD  004DCD04       1   0.027027
4   0000F0DD  006CD213       1   0.027027
5   0000F0DD  00889D31       9   0.243243
6   0000AAAA  00F8733A       4   0.114286
7   0000AAAA  00FDD591       1   0.028571
8   0000AAAA  01243458       4   0.114286
9   0000AAAA  01292867      16   0.457143
10  0000AAAA  0131445A       9   0.257143
11  0000AAAA  013CB69F       1   0.028571

df['SFPercent'] = df.SFReps.div(df.groupby('VID')['SFReps'].transform('sum'))
print (df)
         VID      SFID  SFReps  SFPercent
0   0000F0DD  000C5AF6       9   0.243243
1   0000F0DD  000E701F      16   0.432432
2   0000F0DD  00481C04       1   0.027027
3   0000F0DD  004DCD04       1   0.027027
4   0000F0DD  006CD213       1   0.027027
5   0000F0DD  00889D31       9   0.243243
6   0000AAAA  00F8733A       4   0.114286
7   0000AAAA  00FDD591       1   0.028571
8   0000AAAA  01243458       4   0.114286
9   0000AAAA  01292867      16   0.457143
10  0000AAAA  0131445A       9   0.257143
11  0000AAAA  013CB69F       1   0.028571

Last if necessary add round:

df['SFPercent'] = df.SFReps.div(df.groupby('VID')['SFReps'].transform('sum')).round(2)
print (df)
         VID      SFID  SFReps  SFPercent
0   0000F0DD  000C5AF6       9       0.24
1   0000F0DD  000E701F      16       0.43
2   0000F0DD  00481C04       1       0.03
3   0000F0DD  004DCD04       1       0.03
4   0000F0DD  006CD213       1       0.03
5   0000F0DD  00889D31       9       0.24
6   0000AAAA  00F8733A       4       0.11
7   0000AAAA  00FDD591       1       0.03
8   0000AAAA  01243458       4       0.11
9   0000AAAA  01292867      16       0.46
10  0000AAAA  0131445A       9       0.26
11  0000AAAA  013CB69F       1       0.03

Upvotes: 4

Related Questions