Reputation: 3661
I am porting some code from python 2 to python 3. In my code, I define a data type for strings as:
MAX_WORD_LENGTH = 32
DT_WORD = np.dtype([('word', str('U') + str(MAX_WORD_LENGTH))])
Which shows up as:
>> DT_WORD.descr
[('word', '<U32')]
Now, when I create a basic numpy array, I get no errors:
>> import numpy as np
>> np.array(['a', 'b', 'c', 'd'])
array(['a', 'b', 'c', 'd'],
dtype='<U1')
But when I introduce my data type,
>> np.array(['a','b','c','d'], dtype=DT_WORD)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: a bytes-like object is required, not 'str'
What does this error mean? All strings in python 3 are Unicode by default, so by explicitly stating the data type as Unicode I shouldn't get an error. How do I define my data type so it accepts unicode strings in both python 2 and 3?
Upvotes: 0
Views: 553
Reputation: 3661
I was able to eventually figure it out:
When using labelled dtypes
the array is actually a structured array. Structured arrays arrays are created from a list of tuples (and not simply a list of values). So:
np.array(['a','b','c','d'], dtype=DT_WORD)
Should be:
np.array([('a',), ('b',), ('c',), ('d',)], dtype=DT_WORD)
More concisely, if X
is a list of strings, you can use:
np.array(list(zip(X)), dtype=DT_WORD)
Which is compatible with python 2 and 3.
Also, the same code will give a TypeError
in python 2 as well:
np.array(['a','b','c','d'], dtype=DT_WORD)
# Will give:
TypeError: expected a readable buffer
So my question was partly incorrect in the first place. It had less to do with python version than with the distinction between arrays and structured arrays.
Upvotes: 2