Reputation: 1199
I have a dictionary that maps numeric values to labels. I use it to create labels for a given numpy array. The array initially contains all NaN values and some elements get populated with non-NaN values. I want to map NaN values to a label. However, this fails:
import numpy as np
# make array with all NaNs
a = np.ones(5) * np.nan
# populate some of it with non-NaN values
a[0] = 1
a[1] = 2
l = {"1": "one", 2: "two", np.nan: "NA"}
for k in l:
if k == np.nan:
print l[k]
# this returns false
print (np.nan in a)
Is this because of the initialization of the array? Why is np.nan
not equal to the NaN values in a
?
I am trying to get a working version of:
print l[a[3]] # should print "NA", not raise keyerror
Upvotes: 0
Views: 3514
Reputation: 85442
You can create your own dictionary that handles NaN they way you want it:
class MyDict(dict):
def __getitem__(self, key):
try:
if np.isnan(key):
return 'NA'
except TypeError:
pass
return super(MyDict, self).__getitem__(key)
def __contains__(self, key):
try:
self.__getitem__(key)
return True
except KeyError:
return False
Test it:
>>> l = MyDict({1: "one", 2: "two"})
>>> l[a[3]]
'NA'
>>> l[a[0]]
'one'
>>> np.nan in l
True
Upvotes: 0
Reputation: 309929
One interesting thing about NaN
is that IEEE
specifies that NaN
doesn't equal anything (including itself). Numpy and python in general follow this rule.
>>> NaN = float('nan')
>>> NaN == NaN
False
>>> import numpy as np
>>> np.nan == np.nan
False
This should explain why your print l['k']
statement doesn't ever print and why np.nan in a
doesn't return True
.
One workaround might be:
numpy.isnan(a).any() # Check if any element in `a` is `nan`.
If I understand your comment correctly, the problem is more appropriately demonstrated by the following snippet of code:
>>> import numpy as np
>>> d = {np.nan: 'foo'}
>>> d[np.nan]
'foo'
>>> a = np.array([np.nan])
>>> d[a[0]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: nan
Unfortunately, there's not much you can do here due to the crazy properties of NaN
. numpy
arrays are essentially C-Arrays that hold floating point numbers. When you have a free-floating np.nan
, it has an ID (memory address) that never changes, so python can lock it down by doing pointer comparisons. This is why the first bit worked with the dict above.
Unfortunately, when you put a NaN
into an array, it fills the value in the array with NaN
. In this case, the ID of that element is relative to the location of the first element in the array -- so python can't tell that this NaN
is the same as the one you used to construct the array (because it isn't). Since ID comparision now fails and equality comparison fails due to the properties of NaN
, you're a bit out of luck.
As for your value -> label conversion, you can probably use numpy builtin functionality:
label_array = np.empty(a.shape, dtype='|S3')
label_array[np.isnan(a)] = 'NA'
label_array[a == 1] = 'one'
label_array[a == 2] = 'two'
For moderately sized arrays, this should be fast enough...
Note, This really only works if you've put the ones and twos in a
directly -- Not if you've done some floating point math to compute them. e.g. a[n] = 5. / 2.5
as precision errors could leave you with numbers really close to 2
that don't quite equal 2
...
Upvotes: 4
Reputation: 77847
NaN fails any comparison check, including against itself. i.e.
NaN == NaN
is False.
Thus, your statement
if k == np.nan:
must return False for all values of k. Instead, try this:
if not k == k:
print l[k]
This yields the desired "NA" output.
Note that you cannot spoof this with
if k != k:
as this also returns False.
Does this work for you?
import numpy as np
# make array with all NaNs
a = np.ones(5) * np.nan
# populate some of it with non-NaN values
a[0] = 1
a[1] = 2
a[3] = 1
l = {1: "one", 2: "two", "NaN": "NA"}
for k in l:
if not k == k:
print l[k]
# this returns false
print (np.nan in a)
a_label = [l[a[n]] if a[n] in l else l["NaN"] for n in range(len(a))]
print a_label
Output:
False
['one', 'two', 'NA', 'one', 'NA']
Upvotes: 0