Reputation: 592
I have a numpy array (mat
) of shape (n,4)
. The array has four columns and large number (n
) of rows. The first three columns represent x
, y
, z
columns in my calculation. I wish to select those rows of the numpy array where the x
column has values below a given number (min_x
) or values above a given number (max_x
), and where the y
column has values below a given number (min_y
) or values above a given number (max_y
) and where the z
column has values below a given number (min_z
) or values above a given number (max_z
).
This is how I am trying to implement this desired functionality presently:
import numpy as np
mark = np.where( ( (mat[:,0]<=min_x) | \
(mat[:,0]>max_x) ) & \
( (mat[:,1]<=min_y) | \
(mat[:,1]>max_y) ) & \
( (mat[:,2]<=min_z) | \
(mat[:,2]>max_z) ) )
mat_new = mat[:,mark[0]]
Is the technique that I am using correct, and the best way to achieve the desired functionality? I will greatly appreciate any help. Thanks.
Upvotes: 3
Views: 1946
Reputation: 40878
What you have now looks fine. But since you are asking about other ways to achieve the desired functionality: you can create a 1-dimensional boolean mask that is either True
or False
for each row index. Here is an example.
>>> import numpy as np
>>> np.random.seed(444)
>>> shape = 15, 4
>>> mat = np.random.randint(low=0, high=10, size=shape)
>>> mat
array([[3, 0, 7, 8],
[3, 4, 7, 6],
[8, 9, 2, 2],
[2, 0, 3, 8],
[0, 6, 6, 0],
[3, 0, 6, 7],
[9, 3, 8, 7],
[3, 2, 6, 9],
[2, 9, 8, 9],
[3, 2, 2, 8],
[1, 5, 6, 7],
[6, 0, 0, 0],
[0, 4, 8, 1],
[9, 8, 5, 8],
[9, 4, 6, 6]])
# The thresholds for x, y, z, respectively
>>> lower = np.array([5, 5, 4])
>>> upper = np.array([6, 6, 7])
>>> idx = len(lower)
# Parentheses are required here. NumPy boolean ops use | and &
# which have different operator precedence than `or` and `and`
>>> mask = np.all((mat[:, :idx] < lower) | (mat[:, :idx] > upper), axis=1)
>>> mask
array([False, False, True, True, False, False, True, False, True,
True, False, False, True, False, False])
Now indexing mat
by mask
will constrain it to row indices where mask
is True
:
>>> mat[mask]
array([[8, 9, 2, 2],
[2, 0, 3, 8],
[9, 3, 8, 7],
[2, 9, 8, 9],
[3, 2, 2, 8],
[0, 4, 8, 1]])
What is a bit different about this approach is that it is scalable: instead of specifying each coordinate condition individually, you can specify them in two arrays, one for the upper threshold and one for the lower, and then take advantage of NumPy's vectorization & broadcasting to build the mask.
np.all()
says, test that all values are True
, row-wise. It captures the "and" conditions from your question, while the |
operator captures the "or".
Upvotes: 3
Reputation: 2726
I'd just drop the np.where
and use the boolean mask instead
x,y,z,_ = mat.T
mask = ( ( (x <= min_x) | (x > max_x) ) &
( (y <= min_y) | (y > max_y) ) &
( (z <= min_z) | (z > max_z) ) )
mat_new = mat[mask]
Upvotes: 2
Reputation: 57033
Looks good to me. You can make it a bit more compact by comparing the columns to the midrange values:
mark = (np.abs(mat[:,0] - (max_x + min_x) / 2) > (max_x - min_x) / 2) &
(np.abs(mat[:,1] - (max_y + min_y) / 2) > (max_y - min_y) / 2) &
(np.abs(mat[:,2] - (max_z + min_z) / 2) > (max_z - min_z) / 2)
Unfortunately, you cannot control the precise boundary conditions (<
vs <=
) anymore. Also, this is probably the slowest solution, even slower than the original one.
Upvotes: 3