Excluding a column after grouping by in pandas

Question

So I was wondering WHY the following is not possible and HOW to get around it.

I've taken a data frame, grouped by one column, and set it to a new variable. Now I want to do something with that data frame and it produced an error

df = pd.DataFrame({'group':list('aaaabbbb'),
                   'val':[1,3,3,2,5,6,6,2],
                   'id':[1,1,2,2,2,3,3,3]})
df    
newdf = df.groupby("group")
newdf.loc[:, newdf.columns != 'val']

df = pd.DataFrame({'group1':list('aaaabbbb'),
                   'group2':list('ccccbbbb'),
                   'val':[1,3,3,2,5,6,6,2],
                   'id':[1,1,2,2,2,3,3,3]})
df    
newdf = df.groupby(["group1","group2"])
newdf.loc[:, newdf.columns != 'val']


AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method

I use both of these data frames to create an iqr like the following

Q1 = df1.quantile(0.15)
Q3 = df1.quantile(0.85)
IQR = Q3 - Q1
df1 = pd.DataFrame(IQR).reset_index()

jpp · Accepted Answer

You need to specify an aggregation function with groupby, for example sum. In addition, it's likely you want the result to be a pd.DataFrame without setting index to groupby columns. This can be achieved by setting as_index=False.

Try this:

import pandas as pd

df = pd.DataFrame({'group1':list('aaaabbbb'),
                   'group2':list('ccccbbbb'),
                   'val':[1,3,3,2,5,6,6,2],
                   'id':[1,1,2,2,2,3,3,3]})

newdf = df.groupby(['group1', 'group2'], as_index=False).sum()
newdf.loc[:, newdf.columns != 'val']

One way to demonstrate this in more detail:

newdf = df.groupby(['group1', 'group2'])
print(type(newdf))        # 
print(type(newdf.sum()))  #

Excluding a column after grouping by in pandas

Answers (1)

Related Questions