Reputation: 2229
Now I'm trying to understand possible ways to index numpy
structured arrays, and I kinda get stuck with it. Just a couple of simple examples:
import numpy as np
arr = np.array(zip(range(5), range(5, 10)), dtype=[('a', int), ('b', int)])
arr[0] # first row (record)
arr[(0,)] # the same, as expected
arr['a'] # field 'a' of each record
arr[('a',)] # "IndexError: unsupported iterator index" ?!
arr[1:3] # second and third rows (records)
arr[1:3, 'a'] # "ValueError: invalid literal for long() with base 10: 'a'" ?!
arr['a', 1:3] # same error
arr[..., 'a'] # here too...
arr['a', ...] # and here
So, two subquestions arise:
'a'
in this case) different from the corresponding singleton tuple
(('a',)
)?arr['a'][1:3]
with a single slice? As you can see, obvious arr['a', 1:3]
doesn't work.I also observed the indexing behavior for built-in list
and non-structured ndarray
, but couldn't find such issues there: putting a single value in a tuple doesn't change anything, and of course indexing like arr[1, 1:3]
for plain ndarray
works as expected. Given that, should the errors in my example be considered as bugs in numpy
?
Upvotes: 3
Views: 353
Reputation: 74182
First, fields are not the same thing as dimensions - although your array arr
has two fields and five rows, numpy actually treats it as one-dimensional (it has shape (5,)
). Second, tuples have a special status when used as indices into numpy arrays. When you put a tuple inside the square indexing brackets, numpy interprets it as a sequence of indices into the corresponding dimensions of the array. In the special case where you have nested tuples, each inner tuple is treated as a sequence of indices into that dimension (as if it were a list
).
Since fields don't count as dimensions, when you index it with arr[('a',)]
, numpy interprets 'a'
as an index into the rows of arr
. The IndexError
is therefore raised because strings aren't a valid type for indexing into a dimension of an array (what is the 'a'th row?).
The same thing happens when you try arr['a', 1:3]
, because this is equivalent to indexing with the tuple ('a', slice(1, 3, None))
. The comma between 'a'
and 1:3
is what makes it a tuple, regardless of the lack of brackets. Again, numpy tries to index into the rows of arr
with 'a'
, which is invalid. However, even if both elements were valid index types, you would still get an IndexError
, since the length of your tuple (2) is greater than the number of dimensions in arr
(1).
arr['a'][1:3]
and arr[1:3]['a']
are both perfectly valid ways to index a slice of a field.
Upvotes: 2