Mathieu Dutour Sikiric
Mathieu Dutour Sikiric

Reputation: 624

Pandas missing values and groupby boolean

I found some strange behavior with groupby and missing values.

df = pd.DataFrame({ "A": [2, 1, 1, 2, 2], "B": [False, np.nan, False, np.nan, False]})

Now computing the groupby I obtain:

>>> dfB.groupby('A')['B'].nunique()
A
1    1
2    2
Name: B, dtype: int64

Is this a bug in pandas? By default we have dropna=True. Thus I think we should have 1 entry for each of them.

Upvotes: 2

Views: 96

Answers (1)

jezrael
jezrael

Reputation: 863166

I think bug, possible solution is pass Series.nunique:

print (df.groupby('A')['B'].agg(pd.Series.nunique))

Or:

print (df.groupby('A')['B'].apply(pd.Series.nunique))
A
1    1
2    1
Name: B, dtype: int64

Upvotes: 1

Related Questions