Reputation: 689
I have a somewhat weird problem which is probably derived from how indexing works in numpy. But for some reason I don't seem to understand it, let alone reach the behavior I'm expecting:
>>> a = np.array([['a', 'b'], ['c', 'd']], dtype='<U10')
>>> a
array([['a', 'b'],
['c', 'd']], dtype='<U10')
>>> a[0] = ['e']
>>> a
array([['e', 'e'],
['c', 'd']], dtype='<U10')
So what I was expecting is to obtain
array([['e'], ['c', 'd']], dtype='<U10)
Can someone give me a hint as of why this is not working as I was expecting, and how to reach the expected behavior?
Also, and in reaction to roganjosh's comment:
>>> a = np.array([np.array(['a', 'b']), np.array(['c', 'd'])])
>>> a[0] = 'e'
>>> a
array([['e', 'e'],['c', 'd']], dtype=object)
However:
>>> a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
>>> a[0] = 'e'
>>> a
array(['c', array(['c', 'd'], dtype='<U1')], dtype=object)
which feels sort of weird.
Thanks in advance!
Upvotes: 0
Views: 255
Reputation: 231665
In your last example, you make a 2 element array. Each element can be anything - a string, a list, or array:
In [113]: a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
<ipython-input-113-3010d1b297e2>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
Without the warning:
In [114]: a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])],object)
...:
In [115]: a.shape
Out[115]: (2,)
In [116]: a
Out[116]:
array([array(['a', 'b', 'l'], dtype='<U1'),
array(['c', 'd'], dtype='<U1')], dtype=object)
In [117]: a[0]
Out[117]: array(['a', 'b', 'l'], dtype='<U1')
In [118]: a[0] = ['foobar']
In [119]: a
Out[119]: array([list(['foobar']), array(['c', 'd'], dtype='<U1')], dtype=object)
In [120]: a[0] = 'foobar'
In [121]: a
Out[121]: array(['foobar', array(['c', 'd'], dtype='<U1')], dtype=object)
This array behaves very much like a 2 element list. In fact I'd question the value of using such an array instead of a list.
Creating an object dtype array with arrays that are all the same shape can be tricky, because np.array
tries to makes multidimensional array where possible (as in your original example).
In [133]: a = np.empty(2,object) # 'blank' array with desired shape
In [134]: a
Out[134]: array([None, None], dtype=object)
In [135]: a[:] = [['a','b'],['c','d']] # assign 2 lists to it
In [136]: a
Out[136]: array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
In [137]: a[1] = np.array(['a','b']) # assign an array to an element
In [138]: a
Out[138]: array([list(['a', 'b']), array(['a', 'b'], dtype='<U1')], dtype=object)
The display gives information about the array elements.
The original example is 2d array. The fact that it is string dtype (or object) doesn't make much difference. It could just as well a numeric array. You can't change the shape by assignment.
In [122]: b = np.array([['a', 'b'], ['c', 'd']], dtype='<U10')
In [123]: b
Out[123]:
array([['a', 'b'],
['c', 'd']], dtype='<U10')
In [124]: b.shape
Out[124]: (2, 2)
The regular multidimensional array indexing rules apply, including broadcasting
.
In [125]: b[0]
Out[125]: array(['a', 'b'], dtype='<U10')
In [126]: _.shape
Out[126]: (2,)
In [127]: b[0] = 'd' # broadcast to the whole row
In [128]: b
Out[128]:
array([['d', 'd'],
['c', 'd']], dtype='<U10')
In [129]: b[0] = ['d','e'] # assign separate elements to the row
In [130]: b
Out[130]:
array([['d', 'e'],
['c', 'd']], dtype='<U10')
In [131]: b[:,1] = ['x','y'] # assign to a column
In [132]: b
Out[132]:
array([['d', 'x'],
['c', 'y']], dtype='<U10')
Look at how these arrays are converted to a list:
In [139]: a.tolist()
Out[139]: [['a', 'b'], array(['a', 'b'], dtype='<U1')]
In [140]: b.tolist()
Out[140]: [['d', 'x'], ['c', 'y']]
numpy
is optimized for numeric multidimensional arrays. All the fast compiled code works on numeric values. It can store strings and general objects, but the processing is at Python speeds, not fast compiled ones.
Upvotes: 1