LarryZ
LarryZ

Reputation: 107

Error in using np.NaN is vectorize functions

I am using Python 3 on 64bit Win1o. I had issues with the following simple function:

def skudiscounT(t):
    s = t.find("ITEMADJ")
    if s >= 0:
        t = t[s + 8:]
        if t.find("-") == 2:
            return t
    else:
        return np.nan # if change to "" it will work fine!

I tried to use this function in np.Vectorize and got the following error:

Traceback (most recent call last):
 File "C:/Users/lz09/Desktop/P3/SODetails_Clean_V1.py", line 45, in <module>
SO["SKUDiscount"] = np.vectorize(skudiscounT)(SO['Description'])
 File "C:\PD\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2739, in __call__
 return self._vectorize_call(func=func, args=vargs)
File "C:\PD\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2818, in _vectorize_call
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: could not convert string to float: '23-126-408'

When I replace the last line [return np.nan] to [return ''] it worked fine. Anyone know why this is case? Thanks!

Upvotes: 0

Views: 1652

Answers (1)

hpaulj
hpaulj

Reputation: 231395

Without otypes the dtype of the return array is determined by the first trial result:

In [232]: f = np.vectorize(skudiscounT)
In [234]: f(['abc'])
Out[234]: array([ nan])
In [235]: _.dtype
Out[235]: dtype('float64')

I'm trying to find an argument that returns a string. It looks like your function can also return None.

From the docs:

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

With otypes:

In [246]: f = np.vectorize(skudiscounT, otypes=[object])
In [247]: f(['abc', '23-126ITEMADJ408'])
Out[247]: array([nan, None], dtype=object)
In [248]: f = np.vectorize(skudiscounT, otypes=['U10'])
In [249]: f(['abc', '23-126ITEMADJ408'])
Out[249]: 
array(['nan', 'None'],
      dtype='<U4')

But for returning a generic object dtype, I'd use the slightly faster:

In [250]: g = np.frompyfunc(skudiscounT, 1,1)
In [251]: g(['abc', '23-126ITEMADJ408'])
Out[251]: array([nan, None], dtype=object)

So what kind of array do you want? float that can hold np.nan, string? or object that can hold 'anything'.

Upvotes: 1

Related Questions