Reputation: 2333
I have a dataframe df:
VID SFID SFReps
0 0000F0DD 000C5AF6 9
1 0000F0DD 000E701F 16
2 0000F0DD 00481C04 1
3 0000F0DD 004DCD04 1
4 0000F0DD 006CD213 1
5 0000F0DD 00889D31 9
6 0000AAAA 00F8733A 4
7 0000AAAA 00FDD591 1
8 0000AAAA 01243458 4
9 0000AAAA 01292867 16
10 0000AAAA 0131445A 9
11 0000AAAA 013CB69F 1
I want to calculate the percentage that each SFReps represents for each VID group.
So the result should be something like:
VID SFID SFReps SFPercent
0 0000F0DD 000C5AF6 9 0.24
1 0000F0DD 000E701F 16 0.43
2 0000F0DD 00481C04 1 0.03
3 0000F0DD 004DCD04 1 0.03
4 0000F0DD 006CD213 1 0.03
5 0000F0DD 00889D31 9 0.24
6 0000AAAA 00F8733A 4 0.11
7 0000AAAA 00FDD591 1 0.03
8 0000AAAA 01243458 4 0.11
9 0000AAAA 01292867 16 0.46
10 0000AAAA 0131445A 9 0.26
11 0000AAAA 013CB69F 1 0.03
I know I can group each VID values using groupby
but after that I'm stumped.
Looping through each row is an option, but I know there is a better way to do this.
Upvotes: 1
Views: 1899
Reputation: 862661
You can divide by new Series
created by transform
and sum
for same index as original df
:
print (df.groupby('VID')['SFReps'].transform('sum'))
0 37
1 37
2 37
3 37
4 37
5 37
6 35
7 35
8 35
9 35
10 35
11 35
Name: SFReps, dtype: int64
df['SFPercent'] = df.SFReps / df.groupby('VID')['SFReps'].transform('sum')
print (df)
VID SFID SFReps SFPercent
0 0000F0DD 000C5AF6 9 0.243243
1 0000F0DD 000E701F 16 0.432432
2 0000F0DD 00481C04 1 0.027027
3 0000F0DD 004DCD04 1 0.027027
4 0000F0DD 006CD213 1 0.027027
5 0000F0DD 00889D31 9 0.243243
6 0000AAAA 00F8733A 4 0.114286
7 0000AAAA 00FDD591 1 0.028571
8 0000AAAA 01243458 4 0.114286
9 0000AAAA 01292867 16 0.457143
10 0000AAAA 0131445A 9 0.257143
11 0000AAAA 013CB69F 1 0.028571
df['SFPercent'] = df.SFReps.div(df.groupby('VID')['SFReps'].transform('sum'))
print (df)
VID SFID SFReps SFPercent
0 0000F0DD 000C5AF6 9 0.243243
1 0000F0DD 000E701F 16 0.432432
2 0000F0DD 00481C04 1 0.027027
3 0000F0DD 004DCD04 1 0.027027
4 0000F0DD 006CD213 1 0.027027
5 0000F0DD 00889D31 9 0.243243
6 0000AAAA 00F8733A 4 0.114286
7 0000AAAA 00FDD591 1 0.028571
8 0000AAAA 01243458 4 0.114286
9 0000AAAA 01292867 16 0.457143
10 0000AAAA 0131445A 9 0.257143
11 0000AAAA 013CB69F 1 0.028571
Last if necessary add round
:
df['SFPercent'] = df.SFReps.div(df.groupby('VID')['SFReps'].transform('sum')).round(2)
print (df)
VID SFID SFReps SFPercent
0 0000F0DD 000C5AF6 9 0.24
1 0000F0DD 000E701F 16 0.43
2 0000F0DD 00481C04 1 0.03
3 0000F0DD 004DCD04 1 0.03
4 0000F0DD 006CD213 1 0.03
5 0000F0DD 00889D31 9 0.24
6 0000AAAA 00F8733A 4 0.11
7 0000AAAA 00FDD591 1 0.03
8 0000AAAA 01243458 4 0.11
9 0000AAAA 01292867 16 0.46
10 0000AAAA 0131445A 9 0.26
11 0000AAAA 013CB69F 1 0.03
Upvotes: 4