Reputation: 187
I have this dataframe:
There are many NaNs that were somehow produced when transforming data:
So I try to drop them using:
df = df.dropna(how='all')
I still just get this (I know I'm only showing 3 columns, but all the columns are filled with NaNs)
I've tried assuming their string and using:
df = df[~df.isin(['NaN']).any(axis=1)]
This also didn't work. Any other thoughts or ideas?
Upvotes: 0
Views: 811
Reputation: 59579
When you slice with a Boolean DataFrame the logic used is where
. That is, where the mask is True
it returns the value, where the mask is False
it by default chooses np.NaN
.
Thus, if you are slicing with df.isna()
by definition you NaN
everything. This is because where df.isna()
is True it passes the value (NaN
) and where the df was not null where
passes NaN
.
import pandas as pd
import numpy as np
df = pd.DataFrame({'foo': np.NaN, 'bar': np.NaN, 'baz': np.NaN, 'boo': 1}, index=['A'])
# foo bar baz boo
#A NaN NaN NaN 1
df.isnull()
# foo bar baz boo
#A True True True False
df[df.isnull()]
# foo bar baz boo
#A NaN NaN NaN NaN
df.where(df.isnull())
# foo bar baz boo
#A NaN NaN NaN NaN
So you don't have rows full of NaN
, your mask just guarantees every cell becomes NaN
. If you want to inspect rows that are NaN
without modifying the values you can display rows with at least 1 NaN
:
df[df.isnull().any(1)]
# foo bar baz boo
#A NaN NaN NaN 1
Or to see the distribution of NaN
across the rows take the value counts of the sum across rows. This shows we have 1 row with 3 null values.
df.isnull().sum(1).value_counts()
#3 1
#dtype: int64
Upvotes: 1