lakerz
lakerz

Reputation: 1061

Numpy does treat float('nan') and float differently - convert to None

I want to create a Numpy array form a normal array and convert nan values to None - but the success depends on weather the first value is a "normal" float, or a float('nan').

Here is my code, starting with the initial array:

print(a)
array('d', [3.2345, nan, 2.0, 3.2, 1.0, 3.0])
print(b)
array('d', [nan, nan, 2.0, 3.2, 1.0, 3.0])

Now I would like to swap all nan values to Python None via a vectorized function:

def convert(x):
    if x != x:
        return None
    else:
        return x

convert_vec = numpy.vectorize(convert)

Simple, but leads to two different results:

numpy.asarray(convert_vec(a))

array([[ 3.2345,  2.    ,  1.    ], [    nan,  3.2   ,  3.    ]])

numpy.asarray(convert_vec(b))
array([[None, 2.0, 1.0], [None, 3.2, 3.0]], dtype=object)

Why is this? Yes, I can see a small difference - the second one has object as dtype. But using numpy.asarray(convert_vec(a), dtype=object) fixed it - both have object as dtype - but it doesn't change the difference in results.

Upvotes: 5

Views: 21948

Answers (2)

bmbigbang
bmbigbang

Reputation: 1378

hpaulj has explained well, here is an easy demonstration on how to do it:

a = [3.2345, numpy.nan, 2.0, 3.2, 1.0, 3.0]
print [i if i is not numpy.nan else None for i in a]

Upvotes: 0

hpaulj
hpaulj

Reputation: 231335

np.nan is a float value, None is not numeric.

In [464]: np.array([1,2,np.nan,3])
Out[464]: array([  1.,   2.,  nan,   3.])

In [465]: np.array([1,2,None,3])
Out[465]: array([1, 2, None, 3], dtype=object)

In [466]: np.array([1,2,None,3],dtype=float)
Out[466]: array([  1.,   2.,  nan,   3.])

If you try to create an array that contains None, the result will be a dtype=object array. If you insist on a float dtype, the None will be converted to nan.

In the vectorize case, if you don't specify the return dtype, it deduces it from the first element.

Your examples are a bit confusing (you need to edit them), but I think that

convert(np.nan) => None
convert(123) => 123

so

convert_vec([123,nan,...]) => [123, nan, ...],dtype=float
convert_vec([nan,123,...]) => [None, 123,...],dtype=object
  • trying to convert np.nan to None is a bad idea, except maybe for display purposes.

  • vectorize without explicit result dtype specification is a bad idea

  • this probably isn't a good use of vectorize.

Here's an alternative way of converting the nan values:

In [467]: a=np.array([1,2,np.nan,34,np.nan],float)    
In [468]: a
Out[468]: array([  1.,   2.,  nan,  34.,  nan])
In [471]: ind=a!=a   
In [472]: ind
Out[472]: array([False, False,  True, False,  True], dtype=bool)

In [473]: a[ind]=0   # not trying None
In [474]: a
Out[474]: array([  1.,   2.,   0.,  34.,   0.])

Or using masked arrays:

In [477]: am=np.ma.masked_invalid(a)

In [478]: am
Out[478]: 
masked_array(data = [1.0 2.0 -- 34.0 --],
             mask = [False False  True False  True],
       fill_value = 1e+20)

In [479]: am.filled(0)
Out[479]: array([  1.,   2.,   0.,  34.,   0.])

Upvotes: 4

Related Questions