NPE
NPE

Reputation: 500475

numpy: boolean indexing and memory usage

Consider the following numpy code:

A[start:end] = B[mask]

Here:

In principle, the above operation can be carried out using O(1) temporary storage, by copying elements of B directly into A.

Is this what actually happens in practice, or does numpy construct a temporary array for B[mask]? If the latter, is there a way to avoid this by rewriting the statement?

Upvotes: 7

Views: 1360

Answers (2)

Sven Marnach
Sven Marnach

Reputation: 601879

The line

A[start:end] = B[mask]

will -- according to the Python language definition -- first evaluate the right hand side, yielding a new array containing the selected rows of B and occupying additional memory. The most efficient pure-Python way I'm aware of to avoid this is to use an explicit loop:

from itertools import izip, compress
for i, b in izip(range(start, end), compress(B, mask)):
    A[i] = b

Of course this will be much less time-efficient than your original code, but it only uses O(1) additional memory. Also note that itertools.compress() is available in Python 2.7 or 3.1 or above.

Upvotes: 3

tillsten
tillsten

Reputation: 14878

Using boolean arrays as a index is fancy indexing, so numpy needs to make a copy. You could write a cython extension to deal with it, if you getting memory problems.

Upvotes: 2

Related Questions