Reputation: 1109
I'm initializing a numpy array and then plugging in loop values at some arbitrary location; for some reason it seems that indexing using double brackets is literally twice as slow as indexing using comma notation.
EDIT: Based on Mike's answer, I'd like to understand how multidimensional indexing can be implemented as a single operation, and whether there ever is a point in using the first notation.
import numpy as np
x = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5]])
def access(arr):
for i in range(1000):
arr[1][2] = i
def access2(arr):
for i in range(1000):
arr[1,2] = i
t1 = Timer('access(x)', 'from __main__ import access, x')
print(t1.timeit(number = 1000))
0.425940990448
t2 = Timer('access2(x)', 'from __main__ import access2, x')
print(t2.timeit(number = 1000))
0.217132806778
Upvotes: 3
Views: 582
Reputation: 85442
This:
nparray[i][j]
means two indexing operations.
This is essentially what happens:
tmp = nparray[i]
tmp[j]
So, you create an intermediate array tmp
you don't need anymore later on.
This is extra work.
While this:
nparray[i, j]
is only one indexing operation. NumPy is more efficient with this method because it does not need to create any intermediate array here.
This is what happens when you do nparray[i, j]
.
The class ndarray
overrides the special methods __getitem__
and __setitem__
.
For example:
>>> class A:
... def __getitem__(self, indices):
... print(indices)
... def __setitem__(self, indices, value):
... print(indices, value)
...
>>> a = A()
>>> a[1, 2]
(1, 2)
>>> a[1, 2] = 23
(1, 2) 23
You can see that the 1, 2
in [1, 2]
arrives there as a tuple.
Now, NumPy can do with this information whatever is appropriate.
In our case do some 2D indexing.
There are not many occasions for using this method:
nparray[i][j]
One of them could be when you use a list of lists for a 2D structure and also (for some reason) want to use a NumPy array as a drop-in replacement. While this is slower, the code would work with a list of lists as well as with a 2D NumPy array.
Upvotes: 5
Reputation: 500267
Let's compare the disassembly:
>>> dis.dis(access)
2 0 SETUP_LOOP 34 (to 37)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 20 (to 36)
16 STORE_FAST 1 (i)
#------------- differences start here -------------#
3 19 LOAD_FAST 1 (i)
22 LOAD_FAST 0 (arr)
25 LOAD_CONST 2 (1)
28 BINARY_SUBSCR
29 LOAD_CONST 3 (2)
32 STORE_SUBSCR
#-------------- differences end here --------------#
33 JUMP_ABSOLUTE 13
>> 36 POP_BLOCK
>> 37 LOAD_CONST 0 (None)
40 RETURN_VALUE
>>> dis.dis(access2)
2 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 1 (i)
#------------- differences start here -------------#
3 19 LOAD_FAST 1 (i)
22 LOAD_FAST 0 (arr)
25 LOAD_CONST 4 ((1, 2))
28 STORE_SUBSCR
#-------------- differences end here --------------#
29 JUMP_ABSOLUTE 13
>> 32 POP_BLOCK
>> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
From this, we can see that the first version involves two extra operations (LOAD_CONST
followed by BINARY_SUBSCR
). Your experiment suggests that this is more expensive that doing a single subscript lookup by tuple, as done by the second function.
Upvotes: 4