Reputation: 487
I'm having trouble using .groupby and .agg using a tuple column
here is the .info()
account_aggregates.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9713 entries, 0 to 9712
Data columns (total 14 columns):
NATIVEACCOUNTKEY 9713 non-null int64
(POLL, sum) 9713 non-null int64
num_cancellations 8 non-null float64
I'm trying to do something like this:
session_deciles2_grouped = account_aggregates.groupby(('POLL','sum'))
and this:
session_deciles22=session_deciles2_grouped[('POLL','sum')].agg(['mean','count'])
but the columns aren't being recognized - I keep getting a key error.
Upvotes: 0
Views: 257
Reputation: 54340
account_aggregates.groupby([('POLL','sum'),])
would be required here.
The reason account_aggregates.groupby(('POLL','sum'))
won't work is because ('POLL','sum')
is a collection, and groupby
reads this as there are a column called POLL
and there is a column called sum
, and use both columns to do a groupby operation.
when we put ('POLL','sum')
in a list, it means to groupby by a column named ('POLL','sum')
.
Therefore, account_aggregates.groupby([('POLL','sum'),])
or account_aggregates.groupby((('POLL','sum'),))
will work.
Upvotes: 1