Reputation: 2921
Say there's a np.float32
matrix A
of shape (N, M)
. Together with A
, I possess another matrix B
, of type np.bool
, of the exact same shape (elements from A
can be mapped 1:1 to B
). Example:
A =
[
[0.1, 0.2, 0.3],
[4.02, 123.4, 534.65],
[2.32, 22.0, 754.01],
[5.41, 23.1, 1245.5],
[6.07, 0.65, 22.12],
]
B =
[
[True, False, True],
[False, False, True],
[True, True, False],
[True, True, True],
[True, False, True],
]
Now, I'd like to perform np.max
, np.min
, np.argmax
and np.argmin
on axis=1
of A
, but only considering elements A[i,j]
for which B[i,j] == True
. Is it possible to do something like this in NumPy? The for-loop
version is trivial, but I'm wondering whether I can get some of that juicy NumPy speed.
The result for A
, B
and np.max
(for example) would be:
[ 0.3, 534.65, 22.0, 1245.5, 22.12 ]
I've avoided ma
because I've heard that the computation gets very slow and I don't feel like specifying fill_value
makes sense in this context. I just want the numbers to be ignored.
Also, if it matters at all in my case, N
ranges in thousands and M
ranges in units.
Upvotes: 1
Views: 72
Reputation: 8142
This is a textbook application for masked arrays. But as always there are other ways to do it.
import numpy as np
A = np.array([[ 0.1, 0.2, 0.3],
[ 4.02, 123.4, 534.65],
[ 2.32, 22.0, 754.01],
[ 5.41, 23.1, 1245.5],
[ 6.07, 0.65, 22.12]])
B = np.array([[ True, False, True],
[False, False, True],
[ True, True, False],
[ True, True, True],
[ True, False, True]])
nanmax
etc.You could cast the 'invalid' values to NaN
(say), then use NumPy's special NaN-ignoring functions:
>>> A[~B] = np.nan # <-- Note this mutates A
>>> np.nanmax(A, axis=1)
array([3.0000e-01, 5.3465e+02, 2.2000e+01, 1.2455e+03, 2.2120e+01])
The catch is that, while np.nanmax
, np.nanmin
, np.nanargmax
, and np.nanargmin
all exist, lots of functions don't have a non-NaN twin, so you might have to come up with something else eventually.
ma
It seems weird not to mention masked arrays, which are straightforward. Notice that the mask is (to my mind anyway) 'backwards'. That is, True
means the value is 'masked' or invalid and will be ignored. Hence having to negate B
with the tilde. Then you can do what you want with the masked array:
>>> X = np.ma.masked_array(A, mask=~B) # <--- Note the tilde.
>>> np.max(X, axis=1)
masked_array(data=[0.3, 534.65, 22.0, 1245.5, 22.12],
mask=[False, False, False, False, False],
fill_value=1e+20)
Upvotes: 1