Reputation: 521
Currently I have data as below:
df_all.head()
Out[2]:
Unnamed: 0 Symbol Date Close Weight
0 4061 A 2016-01-13 36.515889 (0.000002)
1 4062 AA 2016-01-14 36.351784 0.000112
2 4063 AAC 2016-01-15 36.351784 (0.000004)
3 4064 AAL 2016-01-19 36.590483 0.000006
4 4065 AAMC 2016-01-20 35.934062 0.000002
df_all.tail()
Out[3]:
Unnamed: 0 Symbol Date Close Weight
1252498 26950320 nan NaT 9.84 NaN
1252499 26950321 nan NaT 10.26 NaN
1252500 26950322 nan NaT 9.99 NaN
1252501 26950323 nan NaT 9.11 NaN
1252502 26950324 nan NaT 9.18 NaN
df_all.dtypes
Out[4]:
Unnamed: 0 int64
Symbol object
Date datetime64[ns]
Close float64
Weight object
dtype: object
As can be seen, I am getting values in Symbol of nan, Nat for Date and NaN for weight.
MY GOAL: I want to remove any row that has ANY column containing nan, Nat or NaN and have a new df_clean to be the result
I don't seem to be able to apply the appropriate filter? I am not sure if I have to convert the datatypes first (although I tried this as well)
Upvotes: 3
Views: 1656
Reputation: 25649
Since, the symbol 'nan'
is not caught by dropna()
or isnull()
. You need to cast the symbol'nan'
as np.nan
Try this:
df["symbol"] = np.where(df["symbol"]=='nan',np.nan, df["symbol"] )
df.dropna()
Upvotes: 0
Reputation: 76316
You can use
df_all.replace({'nan': None})[~pd.isnull(df_all).any(axis=1)]
This is because isnull
recognizes both NaN
and NaT
as "null" values.
Upvotes: 4