Reputation: 4416
in the the DataFrame "data_to_rank", I have a column "r_DTS". data_to_rank['r_DTS'] shows:
Name: r_DTS, dtype: category
Categories (4, object): [Bottom < 2 < Top < Missing]
When I do:
>>> b = data_to_rank.groupby(['r_DTS'])
>>> for key, group in b: print(key)
Bottom
2
Top
Missing
However, when I group by 'r_DTS' with other variable, the "Missing" in "r_DTS" disapear.
>>> a = data_to_rank.groupby(['GRADE','r_DTS'])
>>> for key, group in a: print(key)
('HY', 'Bottom')
('HY', '2')
('HY', 'Top')
('IG', 'Bottom')
('IG', '2')
('IG', 'Top')
Where is ('HY', 'Missing') and ('IG', 'Missing')?
Upvotes: 1
Views: 423
Reputation: 294218
When you group by a categorical, it includes all categories in the grouping, even the ones with no representation.
When you group by multiple items, even if all of them are categorical dtypes, it doesn't grant you the same privilege.
You must construct your own categorical to group by. This is an example of how to do that:
cats = pd.MultiIndex.from_product([
data_to_rank.GRADE.cat.categories,
data_to_rank.r_DTS.cat.categories,
]).map(tuple)
categorical_to_group_by = pd.Categorical(
data_to_rank[['GRADE', 'r_DTS']].apply(tuple, 1), cats
)
g = data_to_rank.groupby(categorical_to_group_by)
for name, group in g:
print(name)
('HY', 'Bottom')
('HY', 2)
('HY', 'Top')
('HY', 'Missing')
('IG', 'Bottom')
('IG', 2)
('IG', 'Top')
('IG', 'Missing')
Upvotes: 1