Reputation: 317
I'm doing the Titanic Kaggle problem.
I'm trying to return both count & sum in the same query against my dataset. I know I have to use the .agg
function somewhere in my query but I cannot figure out where.
SexI
is a column which relates to the passenger sex and AgeCat
is a column I created to better communicate the Age data:
df[['AgeCat', 'SexI', 'PropSurvive']].groupby(['SexI','AgeCat'], as_index=False).agg(['mean', 'count']).sort_values(by=['SexI','AgeCat'], ascending=True)
So, in this query, when I replace agg(['mean', 'count'])
with .mean
returns kind of what I want, which is the mean of the survive for the different groups i'm analyzing. When I replace with .count
, it returns a count.
But I cannot get them both to return in the same query. Right now it's giving me a KeyError: 'SexI'
, but I do not really see how.
Btw is this similar to an SQL query where you'd select count(ROW) & mean(ROW)
?
Upvotes: 0
Views: 52
Reputation: 862571
I think need reset_index
and add column PropSurvive
to list after groupby
:
(df.groupby(['SexI','AgeCat'])['PropSurvive']
.agg(['mean', 'count'])
.reset_index()
.sort_values(by=['SexI','AgeCat'], ascending=True))
Upvotes: 1