Load numpy array of strings python 3

Question

I am converting code from python 2 into python 3. The array was originally saved in python 2. As part of some of my code I load an array of strings that I have saved. In python 2, I can simply load it as

arr = np.load("path_to_string.npy")

and it gives me

arr = ['str1','str2' etc...]

however, when i do the same in python 3, it doesn't work and I get instead.

arr = [b'str1',b'str2' etc...]

which I take it means that the strings are stored as a different data type. I have tried to convert them using:

 arr = [str(i) for i in arr]

but this just compounds the problem. Can someone explain why this happens and how to fix it? I'm sure its trivial, but am just drawing a blank?

ShadowRanger · Accepted Answer

To be clear, if they were strs in Python 2, then bytes in Python 3 is the "correct" type, in the sense that both of them store byte data; if you wanted arbitrary text data, you would use unicode in Python 2.

For numpy, this is really the correct behavior; numpy doesn't want to silently convert from bytes-oriented data to text-oriented data (among other issues, doing so will bloat the memory usage by a factor of 4x, since fixed width representations of all Unicode characters use four bytes per character). If you really want to change from bytes to str, you can explicitly cast it, though it's a little bit hacky:

>>> arr  # Original version
array([[b'abc', b'123'],
       [b'foo', b'bar']], dtype='|S3')
>>> arr = arr.astype('U')  # Cast from "[S]tring" to "[U]nicode" equivalent
>>> arr
array([['abc', '123'],
       ['foo', 'bar']], dtype='

Load numpy array of strings python 3

Answers (1)

Related Questions