Huayi Wei
Huayi Wei

Reputation: 849

How to improve the efficiency of array operation on the index view of a `numpy` array?

The following is an example code which compute array B from A:

import numpy as np
idx1 = np.array([
 [3, 0, 0],
 [2, 1, 0],
 [2, 0, 1],
 [1, 2, 0],
 [1, 1, 1],
 [1, 0, 2],
 [0, 3, 0],
 [0, 2, 1],
 [0, 1, 2],
 [0, 0, 3]])
idx2 = np.arange(3)
A = np.arange(10*4*3).reshape(10, 4, 3)
B = np.prod(A[:, idx1, idx2], axis=2)

Notice the line

B = np.prod(A[:, idx1, idx2], axis=2)

Is this line memory efficent? Or does numpy will generate some internal array for A[:, idx1, idx2]?

One can image that if len(A) is very large, and numpy generate some internal array for A[:, idx1, idx2], it is not memory efficient. Does there exist any better way to do such thing?

Upvotes: 1

Views: 75

Answers (1)

hpaulj
hpaulj

Reputation: 231738

This expression is parsed and evaluated by the Python interpreter:

B = np.prod(A[:, idx1, idx2], axis=2)

first it does

temp = A[:, idx1, idx2]   # expands to:
temp = A.__getitem__(slice(None), idx1, idx2)

Since idx1, idx2 are arrays, this is advanced indexing, and temp is a copy, not a view.

Next the interpret executes:

np.prod(temp, axis=2)

that is, it passes temporary array to the prod function, which then returns an array, which is assigned to the B variable.

I don't know how much buffering prod does. I can imagine it setting up a nditer (c-api version) that takes two operand arrays, the temp and an output of the right shape (temp.shape(:-1) assuming the sum is on the last dimension of temp). See the reduction section of the docs that I cited in The `out` arguments in `numpy.einsum` can not work as expected.

In sum, Python, when evaluating a function, first evaluates all the arguments, and then passes them to the function. Evaluation of lists can be delayed by using generators, but there isn't an equivalent for numpy arrays.

Upvotes: 2

Related Questions