Reputation: 77
When I do df.isnull().sum(), I get the count of null values in a column. But the default axis for .sum() is None, or 0 - which should be summing across the columns.
Why does .sum() calculate the sum down the columns, instead of the rows, when the default says to sum across axis = 0?
Thanks!
Upvotes: 2
Views: 15269
Reputation: 1
Hey let me tell you what I think it does..
first .isnull()
does is it returns value 1 if any column value is null and then .sum()
will add that 1 that .isnull()
returned and keep on adding as many 1 .isnull()
will return means as many null values are present and if there are no null values .isnull()
will simply return 0 and adding 0 will not result any thing but 0.
I think it helps you understand
Upvotes: 0
Reputation: 31
The axis parameter is orthogonal to the direction which you wish to sum.
Unfortunately, the pandas documentation for sum doesn't currently make this clear, but the documentation for count does: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html
Parameters axis{0 or ‘index’, 1 or ‘columns’}, default 0 If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
Upvotes: 1
Reputation: 153510
Uh.. this is not what I am seeing for functionality. Let's look at this small example.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[np.nan, np.nan, 3],'B':[1,1,3]}, index =[*'abc'])
print(df)
print(df.isnull().sum())
print(df.sum())
Note the columns are uppercase 'A' and 'B', and the index or row indexes are lowercase.
Output:
A B
a NaN 1
b NaN 1
c 3.0 3
A 2
B 0
dtype: int64
A 3.0
B 5.0
dtype: float64
Per docs:
axis : {index (0), columns (1)} Axis for the function to be applied on.
Upvotes: 1
Reputation: 7224
I'm seeing the opposite behavior as you explained:
Sums across the columns
In [3309]: df1.isnull().sum(1)
Out[3309]:
0 0
1 1
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
dtype: int64
Sums down the columns
In [3310]: df1.isnull().sum()
Out[3310]:
date 0
variable 1
value 0
dtype: int64
Upvotes: 2