Alex
Alex

Reputation: 1131

pandas percent of group by two columns

i am looking for a solution to get in the result like the column "result percent of colA". as example, the value represents the percentage of cats (colB) in the month januar (colA), compared to cats and dogs in januar.

import pandas as pd

# set up dataframe
df_ex = pd.DataFrame({'colA':['2021-01-31', '2021-01-31', '2021-01-31', '2021-02-28', 
                              '2021-02-28', '2021-02-28', '2021-03-31', '2021-03-31'],
                   'colB':['cat', 'cat', 'dog', 'cat', 'dog', 'cat', 'cat', 'dog'],
                   'colC':[1,2,3,4,4,5,6,7], })
df_ex = df_ex.groupby(['colA', 'colB']).sum() 
df_ex = df_ex.reset_index() 
df_ex['result percent of colA'] = [0.5, 0.5, 0.69, 0.31, 0.46, 0.54] #expecte result

thanks for help!

Upvotes: 2

Views: 61

Answers (1)

jezrael
jezrael

Reputation: 862681

You can divide column by sum of colC per colA with same size like original with GroupBy.transform:

df_ex['perc'] = df_ex['colC'].div(df_ex.groupby('colA')['colC'].transform('sum')).round(2)

print (df_ex)
         colA colB  colC  perc
0  2021-01-31  cat     3  0.50
1  2021-01-31  dog     3  0.50
2  2021-02-28  cat     9  0.69
3  2021-02-28  dog     4  0.31
4  2021-03-31  cat     6  0.46
5  2021-03-31  dog     7  0.54

Details:

print (df_ex.groupby('colA')['colC'].transform('sum'))
0     6
1     6
2    13
3    13
4    13
5    13
Name: colC, dtype: int64

Upvotes: 2

Related Questions