NumPy masked operation?

Question

Say there's a np.float32 matrix A of shape (N, M). Together with A, I possess another matrix B, of type np.bool, of the exact same shape (elements from A can be mapped 1:1 to B). Example:

A =
[
    [0.1, 0.2, 0.3],
    [4.02, 123.4, 534.65],
    [2.32, 22.0, 754.01],
    [5.41, 23.1, 1245.5],
    [6.07, 0.65, 22.12],
]

B = 
[
    [True, False, True],
    [False, False, True],
    [True, True, False],
    [True, True, True],
    [True, False, True],
]

Now, I'd like to perform np.max, np.min, np.argmax and np.argmin on axis=1 of A, but only considering elements A[i,j] for which B[i,j] == True. Is it possible to do something like this in NumPy? The for-loop version is trivial, but I'm wondering whether I can get some of that juicy NumPy speed.

The result for A, B and np.max (for example) would be:

[ 0.3, 534.65, 22.0, 1245.5, 22.12 ]

I've avoided ma because I've heard that the computation gets very slow and I don't feel like specifying fill_value makes sense in this context. I just want the numbers to be ignored.

Also, if it matters at all in my case, N ranges in thousands and M ranges in units.

Matt Hall · Accepted Answer

This is a textbook application for masked arrays. But as always there are other ways to do it.

import numpy as np

A = np.array([[ 0.1,    0.2,    0.3],
              [ 4.02, 123.4,  534.65],
              [ 2.32,  22.0,  754.01],
              [ 5.41,  23.1, 1245.5],
              [ 6.07,  0.65,   22.12]])

B = np.array([[ True, False,  True],
              [False, False,  True],
              [ True,  True, False],
              [ True,  True,  True],
              [ True,  False, True]])

With `nanmax` etc.

You could cast the 'invalid' values to NaN (say), then use NumPy's special NaN-ignoring functions:

>>> A[~B] = np.nan  # <-- Note this mutates A
>>> np.nanmax(A, axis=1)
array([3.0000e-01, 5.3465e+02, 2.2000e+01, 1.2455e+03, 2.2120e+01])

The catch is that, while np.nanmax, np.nanmin, np.nanargmax, and np.nanargmin all exist, lots of functions don't have a non-NaN twin, so you might have to come up with something else eventually.

With `ma`

It seems weird not to mention masked arrays, which are straightforward. Notice that the mask is (to my mind anyway) 'backwards'. That is, True means the value is 'masked' or invalid and will be ignored. Hence having to negate B with the tilde. Then you can do what you want with the masked array:

>>> X = np.ma.masked_array(A, mask=~B)  # <--- Note the tilde.
>>> np.max(X, axis=1)
masked_array(data=[0.3, 534.65, 22.0, 1245.5, 22.12],
             mask=[False, False, False, False, False],
       fill_value=1e+20)

NumPy masked operation?

Answers (1)

With `nanmax` etc.

With `ma`

Related Questions

NumPy masked operation?

Answers (1)

With nanmax etc.

With ma

Related Questions

With `nanmax` etc.

With `ma`