Reputation: 4521

Get percentage based on groupby in Pandas?

I have a pandas DataFrame like this:

subject bool Count
1   False   329232  
1   True    73896   
2   False   268338  
2   True    76424   
3   False   186167  
3   True    27078   
4   False   172417  
4   True    113268

I would like to turn Count into percents for each subject group. So for example, row 1 would be 329232 / (329232 + 73896) = 0.816 and row 2 would be 73896/ (329232 + 73896) = 0.183. Then the total would change for group 2, and so on.

Is this possible to do by a groupby? I tried iterating over the rows with little success.

Upvotes: 5

Answers (3)

BhishanPoudel

Reputation: 17144

This works for me:


import numpy as np
import pandas as pd

# data
df = pd.DataFrame({'subject': [1, 1, 2, 2, 3, 3, 4, 4],
                   'bool': [False, True, False, True, False, True, False, True],
                   'Count': [329232, 73896, 268338, 76424, 186167, 27078, 172417, 113268]})

# answer
df['Per_Subject_Count_Pct'] = df['Count'].div(
    df.groupby('subject')['Count'].transform(lambda x: x.sum()))
print(df)

Gives:

   subject   bool   Count  Per_Subject_Count_Pct
0        1  False  329232               0.816693
1        1   True   73896               0.183307
2        2  False  268338               0.778328
3        2   True   76424               0.221672
4        3  False  186167               0.873019
5        3   True   27078               0.126981
6        4  False  172417               0.603521
7        4   True  113268               0.396479

Upvotes: 3

Ryan Ni

Reputation: 182

My solution would be like this:

Importing relevant libraries

import pandas as pd
import numpy as np

Creating a Dataframe df

d = {'subject':[1,1,2,2,3,3],'bool':[False,True,False,True,False,True],
'count':[329232,73896,268338,76424,186167,27078]}
df = pd.DataFrame(d)

Using groupby and reset_index

table_sum= df.groupby('subject').sum().reset_index()[['subject','count']]

Zip the groupby output and make it asdictionary and get the frequency using map

look_1 = (dict(zip(table_sum['subject'],table_sum['count'])))
df['cu_sum'] = df['subject'].map(look_1)
df['relative_frequency'] = df['count']/df['cu_sum']

Output

print(df)

       subject   bool   count  cu_sum  relative_frequency
    0        1  False  329232  403128            0.816693
    1        1   True   73896  403128            0.183307
    2        2  False  268338  344762            0.778328
    3        2   True   76424  344762            0.221672
    4        3  False  186167  213245            0.873019
    5        3   True   27078  213245            0.126981

Upvotes: 1

kudeh

Reputation: 913

#create df
d = {'subject': [1, 1, 2, 2, 3, 3, 4, 4], 'bool': [False, True, False, True, False, True, False, True], 'Count': [329232,73896
  ,268338,76424,186167,27078,172417,113268]}

df = pd.DataFrame(d)

#get sums for each subject group
sums = pd.DataFrame(df.groupby(['subject'])['Count'].sum().reset_index())
sums.columns = ['subject', 'sums']

#merge sums to original df
df_sums = df.merge(sums, how='left', on='subject')

#calculate percentages for each row
df_sums['percent'] = df_sums['Count']/df_sums['sums']

df_sums

Upvotes: 0

Get percentage based on groupby in Pandas?

Answers (3)

Related Questions