user2305776
user2305776

Reputation: 13

filter pandas dataframe columns with null data

I have a pandas dataframe with 200+ columns. I'm trying to inspect all the columns with null data. How can I filter/display the columns which have null data? df.isnull().sum() lists count of all columns, but I want to see only columns with non-zero null data count as the number of columns is high.

Upvotes: 1

Views: 15245

Answers (2)

Karn Kumar
Karn Kumar

Reputation: 8816

Newer Pandas versions have new methods DataFrame.isna() and DataFrame.notna()

1) Using DataFrame.isna() method !

>>> df
    A     B     C     D  E      F
0   0   1.0   2.0     3  4    one
1   3   5.0   NaN   NaT  5    two
2   8   NaN  10.0  None  6  three
3  11  12.0  13.0   NaT  7   four

To get Just the List of Columns which are null values:

>>> df.columns[df.isna().any()].tolist()
['B', 'C', 'D']

To list down all the columns which are having nan values.

>>> df.loc[:, df.isna().any()]
      B     C     D
0   1.0   2.0     3
1   5.0   NaN   NaT
2   NaN  10.0  None
3  12.0  13.0   NaT

2) Using DataFrame.isnull() method !

To get Just the List of Columns which are null values, returns type is boolean.

>>> df.isnull().any()
A    False
B     True
C     True
D     True
E    False
F    False
dtype: bool

To get Just the List of Columns which are null having values:

>>> df.columns[df.isnull().any()].tolist()
['B', 'C', 'D']

To select a subset - all columns containing at least one NaN value:

>>> df.loc[:, df.isnull().any()]
      B     C     D
0   1.0   2.0     3
1   5.0   NaN   NaT
2   NaN  10.0  None
3  12.0  13.0   NaT

If you want to count the missing values in each column:

>>> df.isnull().sum()
A    0
B    1
C    1
D    3
E    0
F    0
dtype: int64

OR

>>> df.isnull().sum(axis=0)  # axis=0 , across the columns
A    0
B    1
C    1
D    3
E    0
F    0

# >>> df.isnull().sum(axis=1)  # across the rows

Finally, to get the total number of NaN & non NaN values in the DataFrame:

Nan value counts

>>> df.isnull().sum().sum()

Non NaN value count

>>> df.notnull().sum().sum()

Upvotes: 5

grge
grge

Reputation: 145

Once you've got the counts, just filter on the entries greater than zero:

null_counts = df.isnull().sum()
null_counts[null_counts > 0]

Upvotes: 1

Related Questions