How to count nulls in a group rowwise in pandas DataFrame

Question

According to this topic https://stackoverflow.com/questions/19384532/how-to-count-number-of-rows-per-group-and-other-statistics-in-pandas-group-by I'd like to add one more stat - count null values (a.k.a. NaN) in DataFrame:

tdf = pd.DataFrame(columns = ['indicator', 'v1', 'v2', 'v3', 'v4'], 
                   data = [['A', '3', pd.np.nan, '4', pd.np.nan ],
                           ['A', '3', '4', '4', pd.np.nan ],
                           ['B', pd.np.nan, pd.np.nan, pd.np.nan, pd.np.nan],
                           ['B', '1', None, pd.np.nan, None ],
                           ['C', '9', '7', '4', '0']])

I'd like to use something like this:

tdf.groupby('indicator').agg({'indicator': ['count']})

but with the addition of nulls counter to have it in separate column, like:

tdf.groupby('indicator').agg({'indicator': ['count', 'isnull']})

Now, I get error: AttributeError: Cannot access callable attribute 'isnull' of 'SeriesGroupBy' objects, try using the 'apply' method

How can I access this pd.isnull() function here or use some with its functionality?

Expected output would be:

          indicator      nulls
              count      count
indicator          
A                 2          3
B                 2          7
C                 1          0

Note that pd.np.nan works as None in the same way.

jezrael · Accepted Answer

First set_index and check all missing values with count by sum and then aggregate count with sum:

df = tdf.set_index('indicator').isnull().sum(axis=1).groupby(level=0).agg(['count','sum'])
print (df)
           count  sum
indicator            
A              2    3
B              2    7
C              1    0

Detail:

print (tdf.set_index('indicator').isnull().sum(axis=1))
indicator
A    2
A    1
B    4
B    3
C    0
dtype: int64

Another solution is use function with GroupBy.apply:

def func(x):
    a = len(x)
    b = x.isnull().values.sum()
    return pd.Series([a,b],index=['indicator count','nulls count'])

df = tdf.set_index('indicator').groupby('indicator').apply(func)
print (df)
           indicator count  nulls count
indicator                              
A                        2            3
B                        2            7
C                        1            0

How to count nulls in a group rowwise in pandas DataFrame

Answers (2)

Related Questions