Ankur Agarwal
Ankur Agarwal

Reputation: 24758

numpy array indexing with lists and arrays

I have:

>>> a
array([[1, 2],
       [3, 4]])

>>> type(l), l # list of scalers
(<type 'list'>, [0, 1])

>>> type(i), i # a numpy array
(<type 'numpy.ndarray'>, array([0, 1]))

>>> type(j), j # list of numpy arrays
(<type 'list'>, [array([0, 1]), array([0, 1])])

When I do

>>> a[l] # Case 1, l is a list of scalers

I get

array([[1, 2],
       [3, 4]])

which means indexing happened only on 0th axis.

But when I do

>>> a[j] # Case 2, j is a list of numpy arrays

I get

array([1, 4])

which means indexing happened along axis 0 and axis 1.

Q1: When used for indexing, why is there a difference in treatment of list of scalers and list of numpy arrays ? (Case 1 vs Case 2). In Case 2, I was hoping to see indexing happen only along axis 0 and get

array( [[[1,2],
          [3,4]], 

        [[1,2],
         [3,4]]])

Now, when using numpy array of arrays instead

>>> j1 = np.array(j) # numpy array of arrays

The result below indicates that indexing happened only along axis 0 (as expected)

>>> a[j1] Case 3, j1 is a numpy array of numpy arrays
array([[[1, 2],
        [3, 4]],

       [[1, 2],
        [3, 4]]])

Q2: When used for indexing, why is there a difference in treatment of list of numpy arrays and numpy array of numpy arrays? (Case 2 vs Case 3)

Upvotes: 7

Views: 5640

Answers (2)

hpaulj
hpaulj

Reputation: 231375

Case1, a[l] is actually a[(l,)] which expands to a[(l, slice(None))]. That is, indexing the first dimension with the list l, and an automatic trailing : slice. Indices are passed as a tuple to the array __getitem__, and extra () may be added without confusion.

Case2, a[j] is treated as a[array([0, 1]), array([0, 1]] or a[(array(([0, 1]), array([0, 1])]. In other words, as a tuple of indexing objects, one per dimension. It ends up returning a[0,0] and a[1,1].

Case3, a[j1] is a[(j1, slice(None))], applying the j1 index to just the first dimension.

Case2 is a bit of any anomaly. Your intuition is valid, but for historical reasons, this list of arrays (or list of lists) is interpreted as a tuple of arrays.

This has been discussed in other SO questions, and I think it is documented. But off hand I can't find those references.

So it's safer to use either a tuple of indexing objects, or an array. Indexing with a list has a potential ambiguity.


numpy array indexing: list index and np.array index give different result

This SO question touches on the same issue, though the clearest statement of what is happening is buried in a code link in a comment by @user2357112.

Another way of forcing the Case3 like indexing, make the 2nd dimension slice explicit, a[j,:]

In [166]: a[j]
Out[166]: array([1, 4])
In [167]: a[j,:]
Out[167]: 
array([[[1, 2],
        [3, 4]],

       [[1, 2],
        [3, 4]]])

(I often include the trailing : even if it isn't needed. It makes it clear to me, and readers, how many dimensions we are working with.)

Upvotes: 2

Sraw
Sraw

Reputation: 20214

A1: The structure of l is not the same as j.

l is just one-dimension while j is two-dimension. If you change one of them:

# l = [0, 1]                                 # just one dimension!
l = [[0, 1], [0, 1]]                         # two dimensions
j = [np.array([0,1]), np.array([0, 1])]      # two dimensions

They have the same behave.

A2: The same, the structure of arrays in Case 2 and Case 3 are not the same.

Upvotes: 0

Related Questions