Reputation: 25
What is the difference between indexing a 2D array row/col with [row][col]
vs [row, col]
in numpy/pandas? Is there any implications of using either of these two?
For example:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(arr[1][0])
print(arr[1, 0])
Both give 3
.
Upvotes: 0
Views: 1366
Reputation: 19414
For single elements indexing as in your example, the result is indeed the same. Although as stated in the docs:
So note that
x[0,2] = x[0][2]
though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.
emphasis mine
In this case, not only that double-indexing is less efficient - it simply gives different results. Let's look at an example:
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[1:][0]
[3 4]
>>> arr[1:, 0]
[3 5]
In the first case, we create a new array after the first index which is all rows from index 1 onwards:
>>> arr[1:]
[[3 4]
[5 6]]
Then we simply take the first element of that new array which is [3 4]
.
In the second case, we use numpy indexing which doesn't index the elements but indexes the dimensions. So instead of taking the first row, it is actually taking the first column - [3 5]
.
Upvotes: 2
Reputation: 2676
Using [row][col]
is one more function call than using [row, col]
. When you are indexing an array (in fact, any object, for that matter), you are calling obj.__getitem__
under the hook. Since Python wraps the comma in a tuple, doing obj[row][col]
is the equivalent of calling obj.__getitem__(row).__getitem__(col)
, whereas obj[row, col]
is simply obj.__getitem__((row,col))
. Therefore, indexing with [row, col]
is more efficient because it has one fewer function call (plus some namespace lookups but they can normally be ignored).
Upvotes: 1