Reputation: 1131
i am looking for a solution to get in the result like the column "result percent of colA". as example, the value represents the percentage of cats (colB) in the month januar (colA), compared to cats and dogs in januar.
import pandas as pd
# set up dataframe
df_ex = pd.DataFrame({'colA':['2021-01-31', '2021-01-31', '2021-01-31', '2021-02-28',
'2021-02-28', '2021-02-28', '2021-03-31', '2021-03-31'],
'colB':['cat', 'cat', 'dog', 'cat', 'dog', 'cat', 'cat', 'dog'],
'colC':[1,2,3,4,4,5,6,7], })
df_ex = df_ex.groupby(['colA', 'colB']).sum()
df_ex = df_ex.reset_index()
df_ex['result percent of colA'] = [0.5, 0.5, 0.69, 0.31, 0.46, 0.54] #expecte result
thanks for help!
Upvotes: 2
Views: 61
Reputation: 862681
You can divide column by sum of colC
per colA
with same size like original with GroupBy.transform
:
df_ex['perc'] = df_ex['colC'].div(df_ex.groupby('colA')['colC'].transform('sum')).round(2)
print (df_ex)
colA colB colC perc
0 2021-01-31 cat 3 0.50
1 2021-01-31 dog 3 0.50
2 2021-02-28 cat 9 0.69
3 2021-02-28 dog 4 0.31
4 2021-03-31 cat 6 0.46
5 2021-03-31 dog 7 0.54
Details:
print (df_ex.groupby('colA')['colC'].transform('sum'))
0 6
1 6
2 13
3 13
4 13
5 13
Name: colC, dtype: int64
Upvotes: 2