Anton Bohdanov
Anton Bohdanov

Reputation: 53

Why does comparing to nan yield False (Python)?

Here, I have the following:

>>> import numpy as np
>>> q = np.nan
>>> q == np.nan
False
>>> q is np.nan
True
>>> q in (np.nan, )
True

So, the question is: why nan is not equal to nan, but is nan? (UNIQUE) And why 'in' returns True? I don't seem to be able to trace down the implementation of nan. It leads me to C:\Python33\lib\site-packages\numpy\core\umath.pyd (row NAN = nan), but from there there is no traceable way to find out what nan actually is.

Upvotes: 1

Views: 1632

Answers (1)

Denziloe
Denziloe

Reputation: 8131

The creators of numpy decided that it made most sense that most comparisons to nan, including ==, should yield False. You can do this in Python by defining a __eq__(self, other) method for your object. This behaviour was chosen simply because it is the most useful, for various purposes. After all, the fact that one entry has a missing value, and another entry also has a missing value, does not imply that those two entries are equal. It just implies that you don't know whether they are equal or not, and it's therefore best not to treat them as if they are (e.g. when you join two tables together by pairing up corresponding rows).

is on the other hand is a Python keyword which cannot be overwritten by numpy. It tests whether two objects are the same thing. nan is the same object as nan. This is also useful behaviour to have anyway, because often you will want to e.g. get rid of all entries which don't have a value, which you can achieve with is not nan.

nan in (nan,) returns True because as you probably know, (nan,) is a tuple with only one element, nan, and when Python checks if an object is in a tuple, it is checking whether that object is or == any object in the tuple.

Upvotes: 6

Related Questions