Reputation: 537
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})
Output:
A B C
0 foo one NaN
1 bar one bla2
2 foo two NaN
3 bar three bla3
4 foo two NaN
5 bar two NaN
6 foo one NaN
7 foo three NaN
I would like to use groupby in order to count the number of NaN's for the different combinations of foo.
Expected Output (EDIT):
A B C D
0 foo one NaN 2
1 bar one bla2 0
2 foo two NaN 2
3 bar three bla3 0
4 foo two NaN 2
5 bar two NaN 1
6 foo one NaN 2
7 foo three NaN 1
Currently I am trying this:
df['count']=df.groupby(['A'])['B'].isnull().transform('sum')
But this is not working...
Thank You
Upvotes: 49
Views: 89086
Reputation: 328
just add this parameter dropna=False
df.groupby(['A', 'B','C'], dropna=False).size()
check the documentation: dropnabool, default True If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.
Upvotes: 5
Reputation: 862731
I think you need groupby
with sum
of NaN
values:
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count')
print(df2)
A B count
0 bar one 0
1 bar three 0
2 bar two 1
3 foo one 2
4 foo three 1
5 foo two 2
Notice that the .isnull()
is on the original Dataframe column, not on the groupby()
-object. The groupby()
does not have .isnull()
but if it would have it, it would be expected to give the same result as with .isnull()
on the original DataFrame.
If need filter first add boolean indexing
:
df = df[df['A'] == 'foo']
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int)
print(df2)
A B
foo one 2
three 1
two 2
Or simpler:
df = df[df['A'] == 'foo']
df2 = df['B'].value_counts()
print(df2)
one 2
two 2
three 1
Name: B, dtype: int64
EDIT: Solution is very similar, only add transform
:
df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int)
print(df)
A B C D
0 foo one NaN 2
1 bar one bla2 0
2 foo two NaN 2
3 bar three bla3 0
4 foo two NaN 2
5 bar two NaN 1
6 foo one NaN 2
7 foo three NaN 1
Similar solution:
df['D'] = df.C.isnull()
df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int)
print(df)
A B C D
0 foo one NaN 2
1 bar one bla2 0
2 foo two NaN 2
3 bar three bla3 0
4 foo two NaN 2
5 bar two NaN 1
6 foo one NaN 2
7 foo three NaN 1
Upvotes: 58
Reputation: 4060
df[df.A == 'foo'].groupby('b').agg({'C': lambda x: x.isnull().sum()})
returns:
=> C
B
one 2
three 1
two 2
Upvotes: 39