numpy index access vs numpy.array.item performance

Question

I've been using numpy for sometime, and was used to access arrays by using the index operation, just like python lists, like so:

img = np.zeros((640,480,3))
img[34, 19, 2]

However reading a certain book, I came across the item and itemset methods.
According them docs, they provide a performance improvement. However, it is explained at all, at least at the documentation, why that happens.
Does anyone knows the reason of it?

J&#233;r&#244;me Richard · Accepted Answer

TL;DR: The difference of speed comes from the different types used by item/itemset and the fact that the [] operator is more generic. Indeed, both use the built-in float type of the Python interpreter while img[34, 19, 2] return the python object np.float64. Moreover, the [] operator support not only direct indexing, but also array sub-views and array filtering, not supported by item/itemset.

To fully understand why there is a performance difference, one should look in the numpy code. item and itemset methods respectively call array_toscalar and array_setscalar. Alternatively, getting and setting directly an array element respectively calls array_subscript and array_assign_subscript.

The two last methods are a bit more costly because they are more generic. Indeed, by looking the difference between array_toscalar and array_subscript, one can see that the former executes few computations and mainly call PyArray_MultiIndexGetItem which calls DOUBLE_getitem, while the later execute more checks and allocations and mainly calls PyArray_Scalar which calls scalar_value which itself perform an indirect jump to finally produce a np.float64 object.

Note that although item and itemset can be faster than the [] operator, numpy direct indexing in CPython is still pretty slow. Numba can speed it up a lot by performing native direct indexing.

numpy index access vs numpy.array.item performance

Answers (1)

Related Questions