pandas percent of group by two columns

Question

i am looking for a solution to get in the result like the column "result percent of colA". as example, the value represents the percentage of cats (colB) in the month januar (colA), compared to cats and dogs in januar.

import pandas as pd

# set up dataframe
df_ex = pd.DataFrame({'colA':['2021-01-31', '2021-01-31', '2021-01-31', '2021-02-28', 
                              '2021-02-28', '2021-02-28', '2021-03-31', '2021-03-31'],
                   'colB':['cat', 'cat', 'dog', 'cat', 'dog', 'cat', 'cat', 'dog'],
                   'colC':[1,2,3,4,4,5,6,7], })
df_ex = df_ex.groupby(['colA', 'colB']).sum() 
df_ex = df_ex.reset_index() 
df_ex['result percent of colA'] = [0.5, 0.5, 0.69, 0.31, 0.46, 0.54] #expecte result

thanks for help!

jezrael · Accepted Answer

You can divide column by sum of colC per colA with same size like original with GroupBy.transform:

df_ex['perc'] = df_ex['colC'].div(df_ex.groupby('colA')['colC'].transform('sum')).round(2)

print (df_ex)
         colA colB  colC  perc
0  2021-01-31  cat     3  0.50
1  2021-01-31  dog     3  0.50
2  2021-02-28  cat     9  0.69
3  2021-02-28  dog     4  0.31
4  2021-03-31  cat     6  0.46
5  2021-03-31  dog     7  0.54

Details:

print (df_ex.groupby('colA')['colC'].transform('sum'))
0     6
1     6
2    13
3    13
4    13
5    13
Name: colC, dtype: int64

pandas percent of group by two columns

Answers (1)

Related Questions