hazrmard
hazrmard

Reputation: 3661

How to resolve type errors while creating arrays with custom dtypes in numpy?

I am porting some code from python 2 to python 3. In my code, I define a data type for strings as:

MAX_WORD_LENGTH = 32
DT_WORD = np.dtype([('word', str('U') + str(MAX_WORD_LENGTH))])

Which shows up as:

>> DT_WORD.descr
[('word', '<U32')]

Now, when I create a basic numpy array, I get no errors:

>> import numpy as np
>> np.array(['a', 'b', 'c', 'd'])
array(['a', 'b', 'c', 'd'],
    dtype='<U1')

But when I introduce my data type,

>> np.array(['a','b','c','d'], dtype=DT_WORD)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: a bytes-like object is required, not 'str'

What does this error mean? All strings in python 3 are Unicode by default, so by explicitly stating the data type as Unicode I shouldn't get an error. How do I define my data type so it accepts unicode strings in both python 2 and 3?

Upvotes: 0

Views: 553

Answers (1)

hazrmard
hazrmard

Reputation: 3661

I was able to eventually figure it out:

When using labelled dtypes the array is actually a structured array. Structured arrays arrays are created from a list of tuples (and not simply a list of values). So:

np.array(['a','b','c','d'], dtype=DT_WORD)

Should be:

np.array([('a',), ('b',), ('c',), ('d',)], dtype=DT_WORD)

More concisely, if X is a list of strings, you can use:

np.array(list(zip(X)), dtype=DT_WORD)

Which is compatible with python 2 and 3.

Also, the same code will give a TypeError in python 2 as well:

np.array(['a','b','c','d'], dtype=DT_WORD)
# Will give:
TypeError: expected a readable buffer 

So my question was partly incorrect in the first place. It had less to do with python version than with the distinction between arrays and structured arrays.

Upvotes: 2

Related Questions