Counting and returning mean in Pandas

Question

I'm doing the Titanic Kaggle problem.

I'm trying to return both count & sum in the same query against my dataset. I know I have to use the .agg function somewhere in my query but I cannot figure out where.

SexI is a column which relates to the passenger sex and AgeCat is a column I created to better communicate the Age data:

df[['AgeCat', 'SexI', 'PropSurvive']].groupby(['SexI','AgeCat'], as_index=False).agg(['mean', 'count']).sort_values(by=['SexI','AgeCat'], ascending=True)

So, in this query, when I replace agg(['mean', 'count']) with .mean returns kind of what I want, which is the mean of the survive for the different groups i'm analyzing. When I replace with .count, it returns a count.

But I cannot get them both to return in the same query. Right now it's giving me a KeyError: 'SexI' , but I do not really see how.

Btw is this similar to an SQL query where you'd select count(ROW) & mean(ROW)?

jezrael · Accepted Answer

I think need reset_index and add column PropSurvive to list after groupby:

(df.groupby(['SexI','AgeCat'])['PropSurvive']
   .agg(['mean', 'count'])
   .reset_index() 
   .sort_values(by=['SexI','AgeCat'], ascending=True))

Counting and returning mean in Pandas

Answers (1)

Related Questions