Reputation: 13
I have a pandas dataframe with 200+ columns. I'm trying to inspect all the columns with null data. How can I filter/display the columns which have null data? df.isnull().sum() lists count of all columns, but I want to see only columns with non-zero null data count as the number of columns is high.
Upvotes: 1
Views: 15245
Reputation: 8816
Newer Pandas versions have new methods DataFrame.isna() and DataFrame.notna()
1) Using DataFrame.isna()
method !
>>> df
A B C D E F
0 0 1.0 2.0 3 4 one
1 3 5.0 NaN NaT 5 two
2 8 NaN 10.0 None 6 three
3 11 12.0 13.0 NaT 7 four
To get Just the List of Columns which are null values:
>>> df.columns[df.isna().any()].tolist()
['B', 'C', 'D']
To list down all the columns which are having nan values.
>>> df.loc[:, df.isna().any()]
B C D
0 1.0 2.0 3
1 5.0 NaN NaT
2 NaN 10.0 None
3 12.0 13.0 NaT
2) Using DataFrame.isnull()
method !
To get Just the List of Columns which are null values, returns type is boolean.
>>> df.isnull().any()
A False
B True
C True
D True
E False
F False
dtype: bool
To get Just the List of Columns which are null having values:
>>> df.columns[df.isnull().any()].tolist()
['B', 'C', 'D']
To select a subset - all columns containing at least one NaN
value:
>>> df.loc[:, df.isnull().any()]
B C D
0 1.0 2.0 3
1 5.0 NaN NaT
2 NaN 10.0 None
3 12.0 13.0 NaT
If you want to count the missing values in each column:
>>> df.isnull().sum()
A 0
B 1
C 1
D 3
E 0
F 0
dtype: int64
OR
>>> df.isnull().sum(axis=0) # axis=0 , across the columns
A 0
B 1
C 1
D 3
E 0
F 0
# >>> df.isnull().sum(axis=1) # across the rows
Finally, to get the total number of NaN & non NaN values in the DataFrame:
Nan value counts
>>> df.isnull().sum().sum()
Non NaN value count
>>> df.notnull().sum().sum()
Upvotes: 5
Reputation: 145
Once you've got the counts, just filter on the entries greater than zero:
null_counts = df.isnull().sum()
null_counts[null_counts > 0]
Upvotes: 1