Reputation: 4180
Suppose I have a pandas
data frame like the one below:
df = pandas.DataFrame([[1,'A', 'First'],
[3,'A', 'Second'],
[2,'A', 'Third'],
[5,'B', 'Second'],
[4,'B', 'Third']])
Then, if I do the following:
df.groupby([1,2]).count()
the combination of B
and First
will be dropped because it is not in the original data frame.
I wonder if there is a way to NOT drop this combination and instead give it a count of 0
Upvotes: 0
Views: 68
Reputation: 353369
One way would be to create a MultiIndex
which contains them all and then use that to index into your counts:
>>> g = df.groupby([1,2]).count()
>>> g.loc[pd.MultiIndex.from_product(g.index.levels)]
0
A First 1
Second 1
Third 1
B First NaN
Second 1
Third 1
>>> g.loc[pd.MultiIndex.from_product(g.index.levels)].fillna(0)
0
A First 1
Second 1
Third 1
B First 0
Second 1
Third 1
This works because the new MultiIndex contains the cartesian product of the levels:
>>> g.index
MultiIndex(levels=[['A', 'B'], ['First', 'Second', 'Third']],
labels=[[0, 0, 0, 1, 1], [0, 1, 2, 1, 2]],
names=[1, 2])
>>> pd.MultiIndex.from_product(g.index.levels)
MultiIndex(levels=[['A', 'B'], ['First', 'Second', 'Third']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
Upvotes: 3