Kathirmani Sukumar
Kathirmani Sukumar

Reputation: 10980

pandas checking for nan not working using .isin()

I have the following pandas Dataframe with a NaN in it.

import pandas as pd
df = pd.DataFrame([1,2,3,float('nan')], columns=['A'])
df

    A
0   1
1   2
2   3
3 NaN

I also have the list filter_list using which I want to filter my Dataframe. But if i use .isin() function, it is not detecting the NaN. Instead of getting True I am getting False in the last row

filter_list = [1, float('nan')]

df['A'].isin(filter_list)
0     True
1    False
2    False
3    False
Name: A, dtype: bool

Expected output:

0     True
1    False
2    False
3    True
Name: A, dtype: bool

I know that I can use .isnull() to check for NaNs. But here I have other values to check as well. I am using pandas 0.16.0 version

Edit: The list filter_list comes from the user. So it might or might not have NaN. Thats why i am using .isin()

Upvotes: 5

Views: 14985

Answers (4)

shahar
shahar

Reputation: 365

I think that the simplest way is to use numpy.nan:

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, 3, np.nan], columns=['A'])
filter_list = [1, np.nan]
df['A'].isin(filter_list)

Upvotes: 2

S Anand
S Anand

Reputation: 11988

You could replace nan with a unique non-NaN value that will not occur in your list, say 'NA' or ''. For example:

In [23]: import pandas as pd

In [24]: df = pd.DataFrame([1, 2, 3, pd.np.nan], columns=['A'])

In [25]: filter_list = pd.Series([1, pd.np.nan])

In [26]: na_equiv = 'NA'

In [27]: df['A'].replace(pd.np.nan, na_equiv).isin(filter_list.replace(pd.np.nan, na_equiv))
Out[27]:
0     True
1    False
2    False
3     True
Name: A, dtype: bool

Upvotes: 6

HYRY
HYRY

Reputation: 97331

If you really what to use isin() to match NaN. You can create a class that has the same hash as nan and return True when compare to nan:

import numpy as np
import pandas as pd

class NAN(object):
    def __eq__(self, v):
        return np.isnan(v)

    def __hash__(self):
        return hash(np.nan)

nan = NAN()

df = pd.DataFrame([1,2,3,float('nan')], columns=['A'])
df.A.isin([1, nan])

Upvotes: 1

unutbu
unutbu

Reputation: 880757

The float NaN has the interesting property that it is not equal to itself:

In [194]: float('nan') == float('nan')
Out[194]: False

isin checks for equality. So you can't use isin to check if a value equals NaN. To check for NaNs it is best to use np.isnull.


In [200]: df['A'].isin([1]) | df['A'].isnull()
Out[200]: 
0     True
1    False
2    False
3     True
Name: A, dtype: bool

Upvotes: 8

Related Questions