John
John

Reputation: 521

removing rows with any column containing NaN, NaTs, and nans

Currently I have data as below:

df_all.head()
Out[2]: 
   Unnamed: 0 Symbol       Date      Close       Weight
0        4061      A 2016-01-13  36.515889   (0.000002)
1        4062     AA 2016-01-14  36.351784    0.000112 
2        4063    AAC 2016-01-15  36.351784   (0.000004)
3        4064    AAL 2016-01-19  36.590483    0.000006 
4        4065   AAMC 2016-01-20  35.934062    0.000002 

df_all.tail()
Out[3]: 
         Unnamed: 0 Symbol Date  Close Weight
1252498    26950320    nan  NaT   9.84    NaN
1252499    26950321    nan  NaT  10.26    NaN
1252500    26950322    nan  NaT   9.99    NaN
1252501    26950323    nan  NaT   9.11    NaN
1252502    26950324    nan  NaT   9.18    NaN

df_all.dtypes
Out[4]: 
Unnamed: 0             int64
Symbol                object
Date          datetime64[ns]
Close                float64
Weight                object
dtype: object

As can be seen, I am getting values in Symbol of nan, Nat for Date and NaN for weight.

MY GOAL: I want to remove any row that has ANY column containing nan, Nat or NaN and have a new df_clean to be the result

I don't seem to be able to apply the appropriate filter? I am not sure if I have to convert the datatypes first (although I tried this as well)

Upvotes: 3

Views: 1656

Answers (2)

Merlin
Merlin

Reputation: 25649

Since, the symbol 'nan' is not caught by dropna() or isnull(). You need to cast the symbol'nan' as np.nan

Try this:

 df["symbol"] = np.where(df["symbol"]=='nan',np.nan, df["symbol"] )
 df.dropna()

Upvotes: 0

Ami Tavory
Ami Tavory

Reputation: 76316

You can use

df_all.replace({'nan': None})[~pd.isnull(df_all).any(axis=1)]

This is because isnull recognizes both NaN and NaT as "null" values.

Upvotes: 4

Related Questions