PyPan
PyPan

Reputation: 25

array[row][col] vs array[row,col] in Python

What is the difference between indexing a 2D array row/col with [row][col] vs [row, col] in numpy/pandas? Is there any implications of using either of these two?

For example:

import numpy as np

arr = np.array([[1, 2], [3, 4]])
print(arr[1][0])
print(arr[1, 0])

Both give 3.

Upvotes: 0

Views: 1366

Answers (2)

Tomerikoo
Tomerikoo

Reputation: 19414

Single-element indexing

For single elements indexing as in your example, the result is indeed the same. Although as stated in the docs:

So note that x[0,2] = x[0][2] though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.

emphasis mine

Array indexing

In this case, not only that double-indexing is less efficient - it simply gives different results. Let's look at an example:

>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[1:][0]
[3 4]
>>> arr[1:, 0]
[3 5]

In the first case, we create a new array after the first index which is all rows from index 1 onwards:

>>> arr[1:]
[[3 4]
 [5 6]]

Then we simply take the first element of that new array which is [3 4].

In the second case, we use numpy indexing which doesn't index the elements but indexes the dimensions. So instead of taking the first row, it is actually taking the first column - [3 5].

Upvotes: 2

Mia
Mia

Reputation: 2676

Using [row][col] is one more function call than using [row, col]. When you are indexing an array (in fact, any object, for that matter), you are calling obj.__getitem__ under the hook. Since Python wraps the comma in a tuple, doing obj[row][col] is the equivalent of calling obj.__getitem__(row).__getitem__(col), whereas obj[row, col] is simply obj.__getitem__((row,col)). Therefore, indexing with [row, col] is more efficient because it has one fewer function call (plus some namespace lookups but they can normally be ignored).

Upvotes: 1

Related Questions