Reputation: 446
I am working with numpy arrays filled with strings. My goal is to assign to a slice of a first array a
, values contained in a second array b
of smaller size.
The implementation that I had in mind is the following:
import numpy as np
a = np.empty((10,), dtype=str)
b = np.array(['TEST' for _ in range(2)], dtype=str)
print(b)
a[1:3] = b
print(a)
print(b)
returns, as expected ['TEST' 'TEST']
But then print(a)
returns ['' 'T' 'T' '' '' '' '' '' '' '']
. Therefore the values from b
are not correctly assigned to the slice of a
.
Any idea of what is causing this wizardry?
Thanks!
Upvotes: 3
Views: 8042
Reputation: 947
You can see it as a form of overflow.
Have a look at the exact types of your arrays:
>>> a.dtype
dtype('<U1') # Array of 1 unicode char
>>> b.dtype
dtype('<U4') # array of 4 unicode chars
When you define an array of strings, numpy
tries to infer the smallest size of string it that can contain all the elements you defined.
a
, 1 character is enoughb
, TEST
is 4 chars longThen, when you assign a new value to any new element of an array of strings, numpy will truncate the new value to the capacity of the array. Here, it keeps only the first letter of TEST
, T
.
Your slicing operation has nothing to do with it:
a = np.zeros(1, dtype=str)
a[0] = 'hello world'
print(a[0])
# h
How to overcome it
a
with a dtype of object: numpy will not try to optimize its storage space anymore, and you'll get a predictable behavioura = np.zero(10, dtype='U256')
will increase the capacity of each cell to 256 charactersUpvotes: 6
Reputation: 481
The problem is that numpy truncates the string to lenght 1
when specifying dtype=str
.
You can resolve the issue by using dtype='<U4'
though.
So following code would work for your case:
import numpy as np
a = np.empty((10,), dtype='<U4')
b = np.array(['TEST' for _ in range(2)], dtype=str)
print(b)
a[1:3] = b
print(a)
The number in dtype='<U4'
specifies the maximum possible length for a string in that array - so for your case 4
is fine since 'TEST'
only has 4
letters.
Upvotes: 2