Reputation: 624
I found some strange behavior with groupby and missing values.
df = pd.DataFrame({ "A": [2, 1, 1, 2, 2], "B": [False, np.nan, False, np.nan, False]})
Now computing the groupby I obtain:
>>> dfB.groupby('A')['B'].nunique()
A
1 1
2 2
Name: B, dtype: int64
Is this a bug in pandas? By default we have dropna=True. Thus I think we should have 1 entry for each of them.
Upvotes: 2
Views: 96
Reputation: 863166
I think bug, possible solution is pass Series.nunique
:
print (df.groupby('A')['B'].agg(pd.Series.nunique))
Or:
print (df.groupby('A')['B'].apply(pd.Series.nunique))
A
1 1
2 1
Name: B, dtype: int64
Upvotes: 1