Reputation: 10980
I have the following pandas Dataframe with a NaN
in it.
import pandas as pd
df = pd.DataFrame([1,2,3,float('nan')], columns=['A'])
df
A
0 1
1 2
2 3
3 NaN
I also have the list filter_list
using which I want to filter my Dataframe. But if i use .isin()
function, it is not detecting the NaN
. Instead of getting True
I am getting False
in the last row
filter_list = [1, float('nan')]
df['A'].isin(filter_list)
0 True
1 False
2 False
3 False
Name: A, dtype: bool
Expected output:
0 True
1 False
2 False
3 True
Name: A, dtype: bool
I know that I can use .isnull()
to check for NaNs
. But here I have other values to check as well. I am using pandas 0.16.0
version
Edit: The list filter_list
comes from the user. So it might or might not have NaN
. Thats why i am using .isin()
Upvotes: 5
Views: 14985
Reputation: 365
I think that the simplest way is to use numpy.nan
:
import pandas as pd
import numpy as np
df = pd.DataFrame([1, 2, 3, np.nan], columns=['A'])
filter_list = [1, np.nan]
df['A'].isin(filter_list)
Upvotes: 2
Reputation: 11988
You could replace nan
with a unique non-NaN value that will not occur in your list, say 'NA'
or ''
. For example:
In [23]: import pandas as pd
In [24]: df = pd.DataFrame([1, 2, 3, pd.np.nan], columns=['A'])
In [25]: filter_list = pd.Series([1, pd.np.nan])
In [26]: na_equiv = 'NA'
In [27]: df['A'].replace(pd.np.nan, na_equiv).isin(filter_list.replace(pd.np.nan, na_equiv))
Out[27]:
0 True
1 False
2 False
3 True
Name: A, dtype: bool
Upvotes: 6
Reputation: 97331
If you really what to use isin()
to match NaN. You can create a class that has the same hash as nan and return True when compare to nan:
import numpy as np
import pandas as pd
class NAN(object):
def __eq__(self, v):
return np.isnan(v)
def __hash__(self):
return hash(np.nan)
nan = NAN()
df = pd.DataFrame([1,2,3,float('nan')], columns=['A'])
df.A.isin([1, nan])
Upvotes: 1
Reputation: 880757
The float NaN has the interesting property that it is not equal to itself:
In [194]: float('nan') == float('nan')
Out[194]: False
isin
checks for equality. So you can't use isin
to check if a value equals NaN.
To check for NaNs it is best to use np.isnull
.
In [200]: df['A'].isin([1]) | df['A'].isnull()
Out[200]:
0 True
1 False
2 False
3 True
Name: A, dtype: bool
Upvotes: 8