Captain Trojan
Captain Trojan

Reputation: 2921

NumPy masked operation?

Say there's a np.float32 matrix A of shape (N, M). Together with A, I possess another matrix B, of type np.bool, of the exact same shape (elements from A can be mapped 1:1 to B). Example:

A =
[
    [0.1, 0.2, 0.3],
    [4.02, 123.4, 534.65],
    [2.32, 22.0, 754.01],
    [5.41, 23.1, 1245.5],
    [6.07, 0.65, 22.12],
]

B = 
[
    [True, False, True],
    [False, False, True],
    [True, True, False],
    [True, True, True],
    [True, False, True],
]

Now, I'd like to perform np.max, np.min, np.argmax and np.argmin on axis=1 of A, but only considering elements A[i,j] for which B[i,j] == True. Is it possible to do something like this in NumPy? The for-loop version is trivial, but I'm wondering whether I can get some of that juicy NumPy speed.

The result for A, B and np.max (for example) would be:

[ 0.3, 534.65, 22.0, 1245.5, 22.12 ]

I've avoided ma because I've heard that the computation gets very slow and I don't feel like specifying fill_value makes sense in this context. I just want the numbers to be ignored.

Also, if it matters at all in my case, N ranges in thousands and M ranges in units.

Upvotes: 1

Views: 72

Answers (1)

Matt Hall
Matt Hall

Reputation: 8142

This is a textbook application for masked arrays. But as always there are other ways to do it.

import numpy as np

A = np.array([[ 0.1,    0.2,    0.3],
              [ 4.02, 123.4,  534.65],
              [ 2.32,  22.0,  754.01],
              [ 5.41,  23.1, 1245.5],
              [ 6.07,  0.65,   22.12]])

B = np.array([[ True, False,  True],
              [False, False,  True],
              [ True,  True, False],
              [ True,  True,  True],
              [ True,  False, True]])

With nanmax etc.

You could cast the 'invalid' values to NaN (say), then use NumPy's special NaN-ignoring functions:

>>> A[~B] = np.nan  # <-- Note this mutates A
>>> np.nanmax(A, axis=1)
array([3.0000e-01, 5.3465e+02, 2.2000e+01, 1.2455e+03, 2.2120e+01])

The catch is that, while np.nanmax, np.nanmin, np.nanargmax, and np.nanargmin all exist, lots of functions don't have a non-NaN twin, so you might have to come up with something else eventually.

With ma

It seems weird not to mention masked arrays, which are straightforward. Notice that the mask is (to my mind anyway) 'backwards'. That is, True means the value is 'masked' or invalid and will be ignored. Hence having to negate B with the tilde. Then you can do what you want with the masked array:

>>> X = np.ma.masked_array(A, mask=~B)  # <--- Note the tilde.
>>> np.max(X, axis=1)
masked_array(data=[0.3, 534.65, 22.0, 1245.5, 22.12],
             mask=[False, False, False, False, False],
       fill_value=1e+20)

Upvotes: 1

Related Questions