6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"Pandas missing values and groupby boolean\",\"text\":\"

I found some strange behavior with groupby and missing values.

\\n\\n

df = pd.DataFrame({ \\\"A\\\": [2, 1, 1, 2, 2], \\\"B\\\": [False, np.nan, False, np.nan, False]})\\n

\\n\\n

Now computing the groupby I obtain:

\\n\\n

>>> dfB.groupby('A')['B'].nunique()\\nA\\n1    1\\n2    2\\nName: B, dtype: int64\\n

\\n\\n

Is this a bug in pandas? By default we have dropna=True. Thus I think we should have 1 entry for each of them.

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Mathieu Dutour Sikiric\"},\"upvoteCount\":2,\"answerCount\":1,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","pandas",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/pandas/1","children":"pandas"}]}],["$","span","pandas-groupby",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/pandas-groupby/1","children":"pandas-groupby"}]}],["$","span","missing-data",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/missing-data/1","children":"missing-data"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://i.sstatic.net/x7aYT.jpg?s=256","alt":"Mathieu Dutour Sikiric","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/1255733/mathieu-dutour-sikiric","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Mathieu Dutour Sikiric"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",624]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"Pandas missing values and groupby boolean"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

I found some strange behavior with groupby and missing values.

\n\n

df = pd.DataFrame({ \"A\": [2, 1, 1, 2, 2], \"B\": [False, np.nan, False, np.nan, False]})\n

\n\n

Now computing the groupby I obtain:

\n\n

>>> dfB.groupby('A')['B'].nunique()\nA\n1    1\n2    2\nName: B, dtype: int64\n

\n\n

Is this a bug in pandas? By default we have dropna=True. Thus I think we should have 1 entry for each of them.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",2]}],["$","p",null,{"children":["Views: ",96]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",1,")"]}],[["$","div","59268453",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://i.sstatic.net/hMDvl.jpg?s=256","alt":"jezrael","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/2901002/jezrael","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"jezrael"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",863166]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

I think bug, possible solution is pass Series.nunique:

\n\n

print (df.groupby('A')['B'].agg(pd.Series.nunique))\n

\n\n

Or:

\n\n

print (df.groupby('A')['B'].apply(pd.Series.nunique))\nA\n1    1\n2    1\nName: B, dtype: int64\n

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",1]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","69460270",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/69460270","className":"text-blue-600 hover:underline","children":"Missing column values fill based on the available values"}]}],["$","li","68492984",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/68492984","className":"text-blue-600 hover:underline","children":"Pandas missing values using conditions (groupby other columns)"}]}],["$","li","68250510",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/68250510","className":"text-blue-600 hover:underline","children":"Pandas Groupby For Missing Rows"}]}],["$","li","66828398",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/66828398","className":"text-blue-600 hover:underline","children":"Fill missing values using groupby Pandas ValueError"}]}],["$","li","63830982",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/63830982","className":"text-blue-600 hover:underline","children":"Python: Pandas Dataframe, groupby but keeping otherwise missing values"}]}],["$","li","62822426",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/62822426","className":"text-blue-600 hover:underline","children":"Pandas GroupBy without filling in missing data"}]}],["$","li","34489141",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/34489141","className":"text-blue-600 hover:underline","children":"How to use group by and return rows with null values"}]}],["$","li","50497328",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/50497328","className":"text-blue-600 hover:underline","children":"pandas groupby for non missing values"}]}],["$","li","49646959",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/49646959","className":"text-blue-600 hover:underline","children":"How can I include missing items using groupby in Pandas?"}]}],["$","li","42195515",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/42195515","className":"text-blue-600 hover:underline","children":"pandas groupby operation with missing data"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

Pandas missing values and groupby boolean

Answers (1)

Related Questions