Reputation: 372
I've been using numpy for sometime, and was used to access arrays by using the index
operation, just like python lists, like so:
img = np.zeros((640,480,3))
img[34, 19, 2]
However reading a certain book, I came across the item and itemset methods.
According them docs, they provide a performance improvement. However, it is explained at all, at least at the documentation, why that happens.
Does anyone knows the reason of it?
Upvotes: 0
Views: 1074
Reputation: 50668
TL;DR: The difference of speed comes from the different types used by item
/itemset
and the fact that the []
operator is more generic. Indeed, both use the built-in float
type of the Python interpreter while img[34, 19, 2]
return the python object np.float64
. Moreover, the []
operator support not only direct indexing, but also array sub-views and array filtering, not supported by item
/itemset
.
To fully understand why there is a performance difference, one should look in the numpy code. item
and itemset
methods respectively call array_toscalar and array_setscalar. Alternatively, getting and setting directly an array element respectively calls array_subscript and array_assign_subscript.
The two last methods are a bit more costly because they are more generic. Indeed, by looking the difference between array_toscalar
and array_subscript
, one can see that the former executes few computations and mainly call PyArray_MultiIndexGetItem which calls DOUBLE_getitem, while the later execute more checks and allocations and mainly calls PyArray_Scalar which calls scalar_value which itself perform an indirect jump to finally produce a np.float64
object.
Note that although item
and itemset
can be faster than the []
operator, numpy direct indexing in CPython is still pretty slow. Numba can speed it up a lot by performing native direct indexing.
Upvotes: 2