Derek Krantz
Derek Krantz

Reputation: 487

error in .groupby and .agg when using a tuple column

I'm having trouble using .groupby and .agg using a tuple column

here is the .info()

account_aggregates.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9713 entries, 0 to 9712
Data columns (total 14 columns):
NATIVEACCOUNTKEY           9713 non-null int64
(POLL, sum)              9713 non-null int64
num_cancellations          8 non-null float64

I'm trying to do something like this:

session_deciles2_grouped = account_aggregates.groupby(('POLL','sum'))

and this:

session_deciles22=session_deciles2_grouped[('POLL','sum')].agg(['mean','count'])

but the columns aren't being recognized - I keep getting a key error.

Upvotes: 0

Views: 257

Answers (1)

CT Zhu
CT Zhu

Reputation: 54340

account_aggregates.groupby([('POLL','sum'),]) would be required here.

The reason account_aggregates.groupby(('POLL','sum')) won't work is because ('POLL','sum') is a collection, and groupby reads this as there are a column called POLL and there is a column called sum, and use both columns to do a groupby operation.

when we put ('POLL','sum') in a list, it means to groupby by a column named ('POLL','sum').

Therefore, account_aggregates.groupby([('POLL','sum'),]) or account_aggregates.groupby((('POLL','sum'),)) will work.

Upvotes: 1

Related Questions