Reputation: 697
I have a Pandas dataframe like this:
import pandas as pd
df = pd.DataFrame(
{'gender':['F','F','F','F','F','M','M','M','M','M'],
'mature':[0,1,0,0,0,1,1,1,0,1],
'cta' :[1,1,0,1,0,0,0,1,0,1]}
)
df['gender'] = df['gender'].astype('category')
df['mature'] = df['mature'].astype('category')
df['cta'] = pd.to_numeric(df['cta'])
df
I calculated the sum (How many times people clicked) and total (the number of sent messages). I want to figure out how to calculate the percentage defined as clicks/total and how to get a dataframe as output.
temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
('total','count')]})
temp_groupby
Upvotes: 1
Views: 424
Reputation: 863266
I think it means you need average, add new tuple to list like:
temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
('total','count'),
('perc', 'mean')]})
print (temp_groupby)
cta
clicks total perc
gender
F 3 5 0.6
M 2 5 0.4
For avoid MultiIndex in columns
specify column after groupby
:
temp_groupby = df.groupby('gender')['cta'].agg([('clicks','sum'),
('total','count'),
('perc', 'mean')]).reset_index()
print (temp_groupby)
gender clicks total perc
0 F 3 5 0.6
1 M 2 5 0.4
Or use named aggregation:
temp_groupby = df.groupby('gender', as_index=False).agg(clicks= ('cta','sum'),
total= ('cta','count'),
perc= ('cta','mean'))
print (temp_groupby)
gender clicks total perc
0 F 3 5 0.6
1 M 2 5 0.4
Upvotes: 1