Reputation: 3461
I'm trying to group a pandas dataframe by a column and then also calculate the mean for multiple columns. In the sample below I would like to group by the 'category' column and then calculate the mean for the 'score' and 'priority' columns. All three columns should be in the resulting dataframe.
I am able to group and calculate the mean for the first column but I don't know how to add the second column. Below my attempt.
Any guidance greatly appreciated.
import pandas as pd
data = [['A', 2, 1], ['A', 4, 2], ['B', 5, 3], ['B', 2, 3]]
df = pd.DataFrame(data, columns=['category', 'score', 'priority'])
print(df)
# This fails:
results_df = df.groupby('category')['score'].agg(['mean',])['priority'].agg(['mean',])
print(results_df)
Upvotes: 3
Views: 3907
Reputation: 152
Your first three lines correctly print out the result
category score priority
0 A 2 1
1 A 4 2
2 B 5 3
3 B 2 3
Now add this line:
df.groupby("category").mean(numeric_only=True)
and you will see:
score priority
category
A 3.0 1.5
B 3.5 3.0
which is probably what you're looking for. Running mean(numeric_only=True)
on a DataFrame calculates means for all numeric columns. (You can leave it out right now, but you'll get a deprecated-feature message.)
Upvotes: 1