Reputation: 1061
I want to create a Numpy array form a normal array and convert nan
values to None
- but the success depends on weather the first value is a "normal" float, or a float('nan')
.
Here is my code, starting with the initial array:
print(a)
array('d', [3.2345, nan, 2.0, 3.2, 1.0, 3.0])
print(b)
array('d', [nan, nan, 2.0, 3.2, 1.0, 3.0])
Now I would like to swap all nan
values to Python None
via a vectorized function:
def convert(x):
if x != x:
return None
else:
return x
convert_vec = numpy.vectorize(convert)
Simple, but leads to two different results:
numpy.asarray(convert_vec(a))
array([[ 3.2345, 2. , 1. ], [ nan, 3.2 , 3. ]])
numpy.asarray(convert_vec(b))
array([[None, 2.0, 1.0], [None, 3.2, 3.0]], dtype=object)
Why is this? Yes, I can see a small difference - the second one has object
as dtype
. But using numpy.asarray(convert_vec(a), dtype=object)
fixed it - both have object
as dtype
- but it doesn't change the difference in results.
Upvotes: 5
Views: 21948
Reputation: 1378
hpaulj has explained well, here is an easy demonstration on how to do it:
a = [3.2345, numpy.nan, 2.0, 3.2, 1.0, 3.0]
print [i if i is not numpy.nan else None for i in a]
Upvotes: 0
Reputation: 231335
np.nan
is a float value, None
is not numeric.
In [464]: np.array([1,2,np.nan,3])
Out[464]: array([ 1., 2., nan, 3.])
In [465]: np.array([1,2,None,3])
Out[465]: array([1, 2, None, 3], dtype=object)
In [466]: np.array([1,2,None,3],dtype=float)
Out[466]: array([ 1., 2., nan, 3.])
If you try to create an array that contains None
, the result will be a dtype=object
array. If you insist on a float
dtype, the None
will be converted to nan
.
In the vectorize
case, if you don't specify the return dtype, it deduces it from the first element.
Your examples are a bit confusing (you need to edit them), but I think that
convert(np.nan) => None
convert(123) => 123
so
convert_vec([123,nan,...]) => [123, nan, ...],dtype=float
convert_vec([nan,123,...]) => [None, 123,...],dtype=object
trying to convert np.nan
to None
is a bad idea, except maybe for display purposes.
vectorize
without explicit result dtype specification is a bad idea
this probably isn't a good use of vectorize
.
Here's an alternative way of converting the nan
values:
In [467]: a=np.array([1,2,np.nan,34,np.nan],float)
In [468]: a
Out[468]: array([ 1., 2., nan, 34., nan])
In [471]: ind=a!=a
In [472]: ind
Out[472]: array([False, False, True, False, True], dtype=bool)
In [473]: a[ind]=0 # not trying None
In [474]: a
Out[474]: array([ 1., 2., 0., 34., 0.])
Or using masked arrays:
In [477]: am=np.ma.masked_invalid(a)
In [478]: am
Out[478]:
masked_array(data = [1.0 2.0 -- 34.0 --],
mask = [False False True False True],
fill_value = 1e+20)
In [479]: am.filled(0)
Out[479]: array([ 1., 2., 0., 34., 0.])
Upvotes: 4