Reputation: 7465
How can I get the column names of a GroupBy object? The object does not supply a columns propertiy. I can aggregate the object first or extract a DataFrame with the get_group()-method but this is either a power consuming hack or error prone if there are dismissed columns (strings for example).
Upvotes: 16
Views: 10954
Reputation: 218
Another alternative is to get the indices attribute, which is a dictionary. The keys of that dictionary are tuples (if more than one level, otherwise strings) giving the grouped levels' values.
Examples:
>>> import pandas as pd
>>> df = pd.DataFrame(
... [
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]
... ],
... index=pd.MultiIndex.from_tuples( [('i', 1), ('ii', 2), ('iii', 3)] ),
... columns=pd.MultiIndex.from_tuples( [('A', 'a'), ('B', 'b'), ('C', 'c')] )
... )
>>> df.columns.names = ['UPPER', 'lower']
>>> df
UPPER A B C
lower a b c
i 1 1 2 3
ii 2 4 5 6
iii 3 7 8 9
>>> grps = df.groupby(axis='columns', level=['UPPER','lower'])
>>> grps.indices
{('A', 'a'): array([0]), ('B', 'b'): array([1]), ('C', 'c'): array([2])}
>>> grps.indices.keys()
dict_keys([('A', 'a'), ('B', 'b'), ('C', 'c')])
>>> grps2.indices
{'A': array([0]), 'B': array([1]), 'C': array([2])}
>>> grps2.indices.keys()
dict_keys(['A', 'B', 'C'])
Upvotes: 0
Reputation: 127
As Ayhan said, g.obj.columns does return columns, but of the dataframe. The group object columns returned by g.any().columns is not the same.
Specifically, g.any().columns does NOT include the columns used to create the groupby whereas g.obj.columns does.
So it depends on your use model for the result if this difference concerns you. In my case, I can be a bit less pedantic, but for a distributable piece of code, you may want to be precise.
In [109]: ww.grp.any().columns
Out[109]:
Index(['inode', 'size', 'drvid', 'path', 'hash', 'ftype', 'id', 'md5',
'parent', 'top'],
dtype='object')
In [110]: ww.grp.any().index.name
Out[110]: 'file'
In [111]: ww.grp.obj.columns
Out[111]:
Index(['inode', 'size', 'drvid', 'path', 'hash', 'ftype', 'file', 'id', 'md5',
'parent', 'top'],
dtype='object')
Upvotes: 3
Reputation:
Looking at the source code of __getitem__
, it seems that you can get the column names with
g.obj.columns
where g is the groupby object. Apparently g.obj
links to the DataFrame.
Upvotes: 18