Reputation: 614
I am converting code from python 2 into python 3. The array was originally saved in python 2. As part of some of my code I load an array of strings that I have saved. In python 2, I can simply load it as
arr = np.load("path_to_string.npy")
and it gives me
arr = ['str1','str2' etc...]
however, when i do the same in python 3, it doesn't work and I get instead.
arr = [b'str1',b'str2' etc...]
which I take it means that the strings are stored as a different data type. I have tried to convert them using:
arr = [str(i) for i in arr]
but this just compounds the problem. Can someone explain why this happens and how to fix it? I'm sure its trivial, but am just drawing a blank?
Upvotes: 0
Views: 352
Reputation: 155323
To be clear, if they were str
s in Python 2, then bytes
in Python 3 is the "correct" type, in the sense that both of them store byte data; if you wanted arbitrary text data, you would use unicode
in Python 2.
For numpy
, this is really the correct behavior; numpy
doesn't want to silently convert from bytes-oriented data to text-oriented data (among other issues, doing so will bloat the memory usage by a factor of 4x, since fixed width representations of all Unicode characters use four bytes per character). If you really want to change from bytes
to str
, you can explicitly cast it, though it's a little bit hacky:
>>> arr # Original version
array([[b'abc', b'123'],
[b'foo', b'bar']], dtype='|S3')
>>> arr = arr.astype('U') # Cast from "[S]tring" to "[U]nicode" equivalent
>>> arr
array([['abc', '123'],
['foo', 'bar']], dtype='<U3')
Upvotes: 2