Reputation: 1998
I have a table, and one column is loaded as np.str from csv. But the dtype says this weird U64 (I guess meaning, unsigned int 64 bit?) and converting with astype doesn't work.
stringIDs = extractedBatch.ID.astype(np.str)
After astype the dtype says 'object'
Upvotes: 4
Views: 17978
Reputation: 231335
In [313]: arr = np.array(['one','twenty'])
In [314]: arr
Out[314]: array(['one', 'twenty'], dtype='<U6')
In [315]: arr.astype(object)
Out[315]: array(['one', 'twenty'], dtype=object)
np.char
applies string methods to elements of a string dtype array:
In [316]: np.char.add(arr, ' foo')
Out[316]: array(['one foo', 'twenty foo'], dtype='<U10')
add
is not defined for numpy string dtypes:
In [317]: np.add(arr, ' foo')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-317-eff87c160b77> in <module>
----> 1 np.add(arr, ' foo')
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U6') dtype('<U6') dtype('<U6')
This use of np.add
turns the 'foo' string into an array before trying to use it. It's trying to add a 'U6' string to a 'U6' string.
np.add
when applied to an object dtype array, delegates the action to the corresponding method of the elements. Since add
is defined for Python strings, it works:
In [318]: np.add(arr.astype(object), ' foo')
Out[318]: array(['one foo', 'twenty foo'], dtype=object)
This pattern applies to all the numpy ufunc
. They are defined for specific dtypes
. If given object
dtypes, they delegate - which may or may not work depending on the methods of the elements.
Both the object
and np.char
approaches do the equivalent of a list comprehension, and at about the same speed:
In [324]: [i+' foo' for i in arr]
Out[324]: ['one foo', 'twenty foo']
===
Example with string replication *
In [319]: arr*2
TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U6') dtype('<U6') dtype('<U6')
In [320]: arr.astype(object)*2
Out[320]: array(['oneone', 'twentytwenty'], dtype=object)
In [322]: np.char.multiply(arr,2)
Out[322]: array(['oneone', 'twentytwenty'], dtype='<U12')
Upvotes: 3
Reputation: 375395
Pandas doesn't use a str dtype, it uses object (even if the underlying values are str):
In [11]: s = pd.Series(['a'], dtype='U64')
In [12]: type(s[0])
Out[12]: str
Upvotes: 0