Fookatchu
Fookatchu

Reputation: 7465

How to get the column names of a DataFrame GroupBy object?

How can I get the column names of a GroupBy object? The object does not supply a columns propertiy. I can aggregate the object first or extract a DataFrame with the get_group()-method but this is either a power consuming hack or error prone if there are dismissed columns (strings for example).

Upvotes: 16

Views: 10954

Answers (3)

tyersome
tyersome

Reputation: 218

Another alternative is to get the indices attribute, which is a dictionary. The keys of that dictionary are tuples (if more than one level, otherwise strings) giving the grouped levels' values.

Examples:

>>> import pandas as pd

>>> df = pd.DataFrame(
...                     [
...                         [1, 2, 3], 
...                         [4, 5, 6], 
...                         [7, 8, 9]
...                     ], 
...                     index=pd.MultiIndex.from_tuples( [('i', 1), ('ii', 2), ('iii', 3)] ),
...                     columns=pd.MultiIndex.from_tuples( [('A', 'a'), ('B', 'b'), ('C', 'c')] )
... )

>>> df.columns.names = ['UPPER', 'lower']

>>> df
UPPER  A  B  C
lower  a  b  c
i   1  1  2  3
ii  2  4  5  6
iii 3  7  8  9

>>> grps = df.groupby(axis='columns', level=['UPPER','lower'])

>>> grps.indices
{('A', 'a'): array([0]), ('B', 'b'): array([1]), ('C', 'c'): array([2])}

>>> grps.indices.keys()
dict_keys([('A', 'a'), ('B', 'b'), ('C', 'c')])

>>> grps2.indices
{'A': array([0]), 'B': array([1]), 'C': array([2])}

>>> grps2.indices.keys()
dict_keys(['A', 'B', 'C'])

Upvotes: 0

JohnT
JohnT

Reputation: 127

As Ayhan said, g.obj.columns does return columns, but of the dataframe. The group object columns returned by g.any().columns is not the same.

Specifically, g.any().columns does NOT include the columns used to create the groupby whereas g.obj.columns does.

So it depends on your use model for the result if this difference concerns you. In my case, I can be a bit less pedantic, but for a distributable piece of code, you may want to be precise.

In [109]: ww.grp.any().columns
Out[109]: 
Index(['inode', 'size', 'drvid', 'path', 'hash', 'ftype', 'id', 'md5',
       'parent', 'top'],
      dtype='object')

In [110]: ww.grp.any().index.name
Out[110]: 'file'

In [111]: ww.grp.obj.columns
Out[111]: 
Index(['inode', 'size', 'drvid', 'path', 'hash', 'ftype', 'file', 'id', 'md5',
       'parent', 'top'],
      dtype='object')

Upvotes: 3

user2285236
user2285236

Reputation:

Looking at the source code of __getitem__, it seems that you can get the column names with

g.obj.columns

where g is the groupby object. Apparently g.obj links to the DataFrame.

Upvotes: 18

Related Questions