mvd
mvd

Reputation: 1199

How to map NaN in numpy to values using dictionary?

I have a dictionary that maps numeric values to labels. I use it to create labels for a given numpy array. The array initially contains all NaN values and some elements get populated with non-NaN values. I want to map NaN values to a label. However, this fails:

import numpy as np
# make array with all NaNs
a = np.ones(5) * np.nan
# populate some of it with non-NaN values
a[0] = 1
a[1] = 2
l = {"1": "one", 2: "two", np.nan: "NA"}
for k in l:
  if k == np.nan:
    print l[k]
# this returns false
print (np.nan in a)

Is this because of the initialization of the array? Why is np.nan not equal to the NaN values in a?

I am trying to get a working version of:

print l[a[3]]  # should print "NA", not raise keyerror

Upvotes: 0

Views: 3514

Answers (3)

Mike Müller
Mike Müller

Reputation: 85442

You can create your own dictionary that handles NaN they way you want it:

class MyDict(dict):

    def __getitem__(self, key):
        try:
            if np.isnan(key):
                return 'NA'
        except TypeError:
            pass
        return super(MyDict, self).__getitem__(key)

    def __contains__(self, key):
        try:
            self.__getitem__(key)
            return True
        except KeyError:
            return False

Test it:

>>> l = MyDict({1: "one", 2: "two"})
>>> l[a[3]]
'NA'
>>> l[a[0]]
'one'
>>> np.nan in l
True

Upvotes: 0

mgilson
mgilson

Reputation: 309929

One interesting thing about NaN is that IEEE specifies that NaN doesn't equal anything (including itself). Numpy and python in general follow this rule.

>>> NaN = float('nan')
>>> NaN == NaN
False
>>> import numpy as np
>>> np.nan == np.nan
False

This should explain why your print l['k'] statement doesn't ever print and why np.nan in a doesn't return True.

One workaround might be:

numpy.isnan(a).any()  # Check if any element in `a` is `nan`.

If I understand your comment correctly, the problem is more appropriately demonstrated by the following snippet of code:

>>> import numpy as np
>>> d = {np.nan: 'foo'}
>>> d[np.nan]
'foo'
>>> a = np.array([np.nan])
>>> d[a[0]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: nan

Unfortunately, there's not much you can do here due to the crazy properties of NaN. numpy arrays are essentially C-Arrays that hold floating point numbers. When you have a free-floating np.nan, it has an ID (memory address) that never changes, so python can lock it down by doing pointer comparisons. This is why the first bit worked with the dict above.

Unfortunately, when you put a NaN into an array, it fills the value in the array with NaN. In this case, the ID of that element is relative to the location of the first element in the array -- so python can't tell that this NaN is the same as the one you used to construct the array (because it isn't). Since ID comparision now fails and equality comparison fails due to the properties of NaN, you're a bit out of luck.

As for your value -> label conversion, you can probably use numpy builtin functionality:

label_array = np.empty(a.shape, dtype='|S3')
label_array[np.isnan(a)] = 'NA'
label_array[a == 1] = 'one'
label_array[a == 2] = 'two'

For moderately sized arrays, this should be fast enough...

Note, This really only works if you've put the ones and twos in a directly -- Not if you've done some floating point math to compute them. e.g. a[n] = 5. / 2.5 as precision errors could leave you with numbers really close to 2 that don't quite equal 2...

Upvotes: 4

Prune
Prune

Reputation: 77847

NaN fails any comparison check, including against itself. i.e.

NaN == NaN

is False.

Thus, your statement

if k == np.nan:

must return False for all values of k. Instead, try this:

if not k == k:
  print l[k]

This yields the desired "NA" output.

Note that you cannot spoof this with

if k != k:

as this also returns False.


Does this work for you?

import numpy as np
# make array with all NaNs
a = np.ones(5) * np.nan
# populate some of it with non-NaN values
a[0] = 1
a[1] = 2
a[3] = 1
l = {1: "one", 2: "two", "NaN": "NA"}
for k in l:
  if not k == k:
    print l[k]
# this returns false
print (np.nan in a)

a_label = [l[a[n]] if a[n] in l else l["NaN"] for n in range(len(a))]
print a_label

Output:

False
['one', 'two', 'NA', 'one', 'NA']

Upvotes: 0

Related Questions