Should I prefer to use numpy.where or array indexing to mask values?

Question

I use array indexing a fair amount to wipe out invalid values in arrays. Something like this:

array[array == 0] = invalid_value

For these kinds of masks, should I used to use numpy.where as:

array = numpy.where(array == 0, invalid_value, array)

perimosocordiae · Accepted Answer

It depends on what you intend. The first operation modifies array in place, whereas the second makes a copy and overwrites the reference.

If you don't mind the in-place modification, my quick tests show that the first option is about 4x faster.

In [7]: foo = np.random.randint(0, 10, 10000)

In [8]: invalid = -1

In [9]: bar = foo.copy()

In [10]: %timeit bar[foo==0] = invalid
10000 loops, best of 3: 45.5 us per loop

In [11]: %timeit np.where(foo==0, invalid, foo)
1000 loops, best of 3: 209 us per loop

Note that foo is unchanged, while bar was modified:

In [12]: np.count_nonzero(foo)
Out[12]: 8984

In [13]: np.count_nonzero(bar)
Out[13]: 10000

Should I prefer to use numpy.where or array indexing to mask values?

Answers (1)

Related Questions