Get subset of rows in the numpy matrix based on the values from the column of another matrix

Question

The title looks complicated, but the problem is not that hard. I have 2 matrices: data_X and data_Y. I have to construct a new matrix based on data_X, which will consists of all the rows of data_X, where the corresponding value in the column column in data_Y is not equal to someNumber. The same for data_Y. For example here is 5 by 2 data_X matrix and 5 by 1 data_Y matrix, column is 0 and someNumber = -1.

[[ 0.09580361  0.11221975]
 [ 0.71409124  0.24583188]
 [ 0.67346718  0.72550385]
 [ 0.40641294  0.01172211]
 [ 0.89974846  0.70378831]]  # data_X

and data_Y = np.array([[5], [-1], [4], [2], [-1]]).

The result would be:

[[ 0.09580361  0.11221975]
 [ 0.67346718  0.72550385]
 [ 0.40641294  0.01172211]]
[5 4 2]

It is not hard to see that this can be achieved by the following:

data_x, data_y = [], []
for i in xrange(len(data_Y)):
    if data_Y[i][column] != someNumber:
        data_y.append(data_Y[i][column])
        data_x.append(data_X[i])

But I believe there is way easier way (like 2 or 3 numpy operations) to get the results I need.

Divakar · Accepted Answer

Use boolean indexing -

In [228]: X
Out[228]: 
array([[ 0.09580361,  0.11221975],
       [ 0.71409124,  0.24583188],
       [ 0.67346718,  0.72550385],
       [ 0.40641294,  0.01172211],
       [ 0.89974846,  0.70378831]])

In [229]: Y
Out[229]: 
array([[ 5],
       [-1],
       [ 4],
       [ 2],
       [-1]])

In [230]: mask = Y!=-1 # Create mask for boolean indexing

In [231]: X[mask.ravel()]
Out[231]: 
array([[ 0.09580361,  0.11221975],
       [ 0.67346718,  0.72550385],
       [ 0.40641294,  0.01172211]])

In [232]: Y[mask]
Out[232]: array([5, 4, 2])

Get subset of rows in the numpy matrix based on the values from the column of another matrix

Answers (1)

Related Questions