Reputation: 5
I have a data frame df and I use several columns from it to groupby:
df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])
In the above way I get the table (data frame) that I need. What is missing is an additional column that contains number of rows in each group greater than mean(average) of that group. in fact I want something like:
col1 col2 mean count above_Mean
A E 4 10 6
F 3 20 4
B G 7 30 15
I tried above return command which gives me everything except the above mean count of per group.
df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])
output:
col1 col2 mean count above_Mean
A E 4 10 6
F 3 20 4
B G 7 30 15
Upvotes: 0
Views: 358
Reputation: 88285
You could also aggregate using a lambda
function as the following:
df.groupby(['col1', 'col2']).agg(['mean', 'count',
lambda x: (x > x.mean()).sum()])
Upvotes: 1