Jameson
Jameson

Reputation: 187

Pandas dropna() not working (it definitely isn't the common reasons why!)

I have this dataframe:

enter image description here

There are many NaNs that were somehow produced when transforming data:

enter image description here

So I try to drop them using:

df = df.dropna(how='all')

I still just get this (I know I'm only showing 3 columns, but all the columns are filled with NaNs)

enter image description here

I've tried assuming their string and using:

df = df[~df.isin(['NaN']).any(axis=1)]

This also didn't work. Any other thoughts or ideas?

Upvotes: 0

Views: 811

Answers (1)

ALollz
ALollz

Reputation: 59579

When you slice with a Boolean DataFrame the logic used is where. That is, where the mask is True it returns the value, where the mask is False it by default chooses np.NaN.

Thus, if you are slicing with df.isna() by definition you NaN everything. This is because where df.isna() is True it passes the value (NaN) and where the df was not null where passes NaN.

import pandas as pd
import numpy as np

df = pd.DataFrame({'foo': np.NaN, 'bar': np.NaN, 'baz': np.NaN, 'boo': 1}, index=['A'])
#   foo  bar  baz  boo
#A  NaN  NaN  NaN    1

df.isnull()
#    foo   bar   baz    boo
#A  True  True  True  False

df[df.isnull()]
#   foo  bar  baz  boo
#A  NaN  NaN  NaN  NaN

df.where(df.isnull())
#   foo  bar  baz  boo
#A  NaN  NaN  NaN  NaN

So you don't have rows full of NaN, your mask just guarantees every cell becomes NaN. If you want to inspect rows that are NaN without modifying the values you can display rows with at least 1 NaN:

df[df.isnull().any(1)]
#   foo  bar  baz  boo
#A  NaN  NaN  NaN    1

Or to see the distribution of NaN across the rows take the value counts of the sum across rows. This shows we have 1 row with 3 null values.

df.isnull().sum(1).value_counts()
#3    1
#dtype: int64

Upvotes: 1

Related Questions