Reputation: 6824
I am currently playing with Kaggle Titanic dataset (train.csv)
Embarked
column has nan
value. But when I tried to filter it using the following code, I am getting an empty array import pandas as pd
df = df.read_csv(<file_loc>, header=0)
df[df.Embarked == 'nan']
I tried to import numpy.nan
to replace the string nan
above. But it doesn't work.
What am I trying to find - is all the cells which are not 'S', 'C', 'Q'.
Also realised later that.... the nan
is a Float type using type(df.Embarked.unique()[-1])
. Could someone help me understand how to identify those nan
cells?
Upvotes: 4
Views: 7512
Reputation: 1465
Starting from the v1.5 pandas introduced some news related to this topic.
As mentioned in the official documentation
pandas missing placeholder should be
but there are some corner cases in which np.nan
create problems.
> A = pd.Series([0.1, 0, None], dtype="Float32")
> A
0 0.1
1 0.0
2 <NA>
dtype: Float32
You cannot assign nan
to it, it will be translated into <NA>
> A[0] = np.nan
> A
0 <NA>
1 0.0
2 <NA>
dtype: Float32
The .isna()
, as explained before, is supposed to find all NA values.
> A.isna()
0 True
1 False
2 True
dtype: bool
But pandas can sometimes introduce the np.nan
if there are computation problems.
> A/A
Out[20]:
0 <NA>
1 NaN
2 <NA>
dtype: Float32
And it doesn't feed well the .isna()
> (A/A).isna()
0 True
1 False
2 True
dtype: bool
But you can count on the different equality behavior:
> pd.NA == pd.NA
<NA>
> np.nan == np.nan
False
Upvotes: 0
Reputation: 11395
NaN
is used to represent missing values.
.isna()
Detect missing values.
.fillna(value)
Fill NA/NaN values
Some examples on a series called col
:
>>> col
0 1.0
1 NaN
2 2.0
dtype: float64
>>> col[col.isna()]
1 NaN
dtype: float64
>>> col.index[col.isna()]
Int64Index([1], dtype='int64')
>>> col.fillna(-1)
0 1.0
1 -1.0
2 2.0
dtype: float64
Note that you can’t compare equality with nan
as by definition it’s not equal to anything, not even itself:
>>> np.nan == np.nan
False
This is likely the property that is used to identify nan
under the hood:
>>> col != col
0 False
1 True
2 False
dtype: bool
But it’s better (more readable) to use the pandas functions than to test for inequality yourself.
Upvotes: 4