Christine Jiang
Christine Jiang

Reputation: 77

Why does df.isnull().sum() work the way it does?

When I do df.isnull().sum(), I get the count of null values in a column. But the default axis for .sum() is None, or 0 - which should be summing across the columns.

Why does .sum() calculate the sum down the columns, instead of the rows, when the default says to sum across axis = 0?

Thanks!

Upvotes: 2

Views: 15269

Answers (4)

harshit joshi
harshit joshi

Reputation: 1

Hey let me tell you what I think it does.. first .isnull() does is it returns value 1 if any column value is null and then .sum() will add that 1 that .isnull() returned and keep on adding as many 1 .isnull() will return means as many null values are present and if there are no null values .isnull() will simply return 0 and adding 0 will not result any thing but 0. I think it helps you understand

Upvotes: 0

human3
human3

Reputation: 31

The axis parameter is orthogonal to the direction which you wish to sum.

Unfortunately, the pandas documentation for sum doesn't currently make this clear, but the documentation for count does: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html

Parameters axis{0 or ‘index’, 1 or ‘columns’}, default 0 If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153510

Uh.. this is not what I am seeing for functionality. Let's look at this small example.

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'A':[np.nan, np.nan, 3],'B':[1,1,3]}, index =[*'abc'])
print(df)
print(df.isnull().sum())
print(df.sum())

Note the columns are uppercase 'A' and 'B', and the index or row indexes are lowercase.

Output:

     A  B
a  NaN  1
b  NaN  1
c  3.0  3

A    2
B    0
dtype: int64

A    3.0
B    5.0
dtype: float64

Per docs:

axis : {index (0), columns (1)} Axis for the function to be applied on.

Upvotes: 1

oppressionslayer
oppressionslayer

Reputation: 7224

I'm seeing the opposite behavior as you explained:

Sums across the columns
In [3309]:  df1.isnull().sum(1)                                                                                                                                                                
Out[3309]: 
0     0
1     1
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
dtype: int64

Sums down the columns

In [3310]:  df1.isnull().sum()                                                                                                                                                                 
Out[3310]: 
date        0
variable    1
value       0
dtype: int64

Upvotes: 2

Related Questions