Reputation: 4521
I have a pandas DataFrame like this:
subject bool Count
1 False 329232
1 True 73896
2 False 268338
2 True 76424
3 False 186167
3 True 27078
4 False 172417
4 True 113268
I would like to turn Count
into percents for each subject group. So for example, row 1 would be 329232 / (329232 + 73896) = 0.816
and row 2 would be 73896/ (329232 + 73896) = 0.183
. Then the total would change for group 2, and so on.
Is this possible to do by a groupby? I tried iterating over the rows with little success.
Upvotes: 5
Views: 4893
Reputation: 17144
This works for me:
import numpy as np
import pandas as pd
# data
df = pd.DataFrame({'subject': [1, 1, 2, 2, 3, 3, 4, 4],
'bool': [False, True, False, True, False, True, False, True],
'Count': [329232, 73896, 268338, 76424, 186167, 27078, 172417, 113268]})
# answer
df['Per_Subject_Count_Pct'] = df['Count'].div(
df.groupby('subject')['Count'].transform(lambda x: x.sum()))
print(df)
Gives:
subject bool Count Per_Subject_Count_Pct
0 1 False 329232 0.816693
1 1 True 73896 0.183307
2 2 False 268338 0.778328
3 2 True 76424 0.221672
4 3 False 186167 0.873019
5 3 True 27078 0.126981
6 4 False 172417 0.603521
7 4 True 113268 0.396479
Upvotes: 3
Reputation: 182
My solution would be like this:
Importing relevant libraries
import pandas as pd
import numpy as np
Creating a Dataframe df
d = {'subject':[1,1,2,2,3,3],'bool':[False,True,False,True,False,True],
'count':[329232,73896,268338,76424,186167,27078]}
df = pd.DataFrame(d)
Using groupby
and reset_index
table_sum= df.groupby('subject').sum().reset_index()[['subject','count']]
Zip the groupby
output and make it asdictionary
and get the frequency using map
look_1 = (dict(zip(table_sum['subject'],table_sum['count'])))
df['cu_sum'] = df['subject'].map(look_1)
df['relative_frequency'] = df['count']/df['cu_sum']
Output
print(df)
subject bool count cu_sum relative_frequency
0 1 False 329232 403128 0.816693
1 1 True 73896 403128 0.183307
2 2 False 268338 344762 0.778328
3 2 True 76424 344762 0.221672
4 3 False 186167 213245 0.873019
5 3 True 27078 213245 0.126981
Upvotes: 1
Reputation: 913
#create df
d = {'subject': [1, 1, 2, 2, 3, 3, 4, 4], 'bool': [False, True, False, True, False, True, False, True], 'Count': [329232,73896
,268338,76424,186167,27078,172417,113268]}
df = pd.DataFrame(d)
#get sums for each subject group
sums = pd.DataFrame(df.groupby(['subject'])['Count'].sum().reset_index())
sums.columns = ['subject', 'sums']
#merge sums to original df
df_sums = df.merge(sums, how='left', on='subject')
#calculate percentages for each row
df_sums['percent'] = df_sums['Count']/df_sums['sums']
df_sums
Upvotes: 0