Reputation: 297
I created a data frame below:
gender_mix = pd.DataFrame({
'user': df.user_type,
'generation': df.generation,
'gender': df.gender,
'record': 1
})\
.groupby(by=['user', 'generation', 'gender'], as_index=False).agg({'record': np.sum})\
.reset_index(drop=True)
user generation gender record
0 Customer baby_boomer Female 19458
1 Customer baby_boomer Male 37510
2 Customer gen_x Female 75333
3 Customer gen_x Male 157443
4 Customer gen_y Female 340061
5 Customer gen_y Male 607945
6 Customer gen_z Female 44980
7 Customer gen_z Male 93751
8 Customer silent Female 159
9 Customer silent Male 608
10 Subscriber baby_boomer Female 530056
11 Subscriber baby_boomer Male 1695197
12 Subscriber gen_x Female 1119945
13 Subscriber gen_x Male 3811786
14 Subscriber gen_y Female 2319716
15 Subscriber gen_y Male 6304151
16 Subscriber gen_z Female 74390
17 Subscriber gen_z Male 284011
18 Subscriber silent Female 20133
19 Subscriber silent Male 59013
I would like to calculate % of record by gender. For example:
user: Customer > generation: baby_boomer > gender: Female 19,458 & Male 37,510. Female is 34% & Male is 66% after rounding for this group of user & generation.
Below is my current solution:
# create a new data frame which calculate total record by group of user & generation
t = gender_mix.groupby(by=['user', 'generation']).sum()\
.reset_index()\
.rename(columns={'record': 'total_by_gen'})
# merge original data frame & 't'
# calculate new variable 'percent' by dividing 'record' with 'total_by_gen'
gender_mix = pd.merge(left=gender_mix, right=t, on=['user', 'generation'])\
.assign(percent = lambda data: data.record * 100 / data.total_by_gen)\
.assign(percent = lambda data: data.percent.round().astype('int'))
Here is part of the new data frame.
user generation gender record total_by_gen percent
0 Customer baby_boomer Female 19458 56968 34
1 Customer baby_boomer Male 37510 56968 66
2 Customer gen_x Female 75333 232776 32
3 Customer gen_x Male 157443 232776 68
4 Customer gen_y Female 340061 948006 36
5 Customer gen_y Male 607945 948006 64
6 Customer gen_z Female 44980 138731 32
7 Customer gen_z Male 93751 138731 68
8 Customer silent Female 159 767 21
9 Customer silent Male 608 767 79
I wonder if there is a way to convert the 'record' column in original data frame to 'percentage by gender' by applying a function?
Upvotes: 1
Views: 451
Reputation: 9257
You can use transform
after the groupby
and assign the results directly to the column 'record'
:
gender_mix['record'] = gender_mix\
.groupby(['user', 'generation'])['record']\
.transform(lambda x: round((x/sum(x)*100)).astype(int))
Upvotes: 1