MJBoa
MJBoa

Reputation: 175

Boolean indexing of 2D ndarray based on row comparison

I have a pair of 2D arrays of the same dimensions, (n, 3). I want to select from the first based on indexing with the second. My idea would be the following:

data[labels == row]

where row is a vector of length 3. The inner boolean comparison gives an array of shape (n, 3). The indexing gives a flat 1d array.

My problem is then, I have to either reshape the array manually, or use something like np.all on the array labels == row.

This actually works correctly though if data is a pandas DataFrame. What's the proper way to do this with pure ndarrays?

Upvotes: 1

Views: 100

Answers (1)

unutbu
unutbu

Reputation: 879163

Use (labels == row).all(axis=1) to select rows where all the values match:

import numpy as np
np.random.seed(2016)

labels = np.random.randint(10, size=(10, 3))
data = np.random.randint(10, size=(10, 3))
# array([[0, 8, 2],
#        [3, 2, 2],
#        [4, 0, 9],
#        [0, 4, 9],
#        [5, 5, 1],
#        [7, 8, 0],
#        [0, 9, 5],
#        [0, 6, 2],
#        [0, 0, 5],
#        [5, 0, 7]])

row = labels[::3] = labels[0]
data[(labels == row).all(axis=1)]

yields

array([[0, 8, 2],
       [0, 4, 9],
       [0, 9, 5],
       [5, 0, 7]])

Notice that the boolean array labels == row has some True values on rows which are not complete matches:

In [138]: labels == row
Out[138]: 
array([[ True,  True,  True],
       [ True, False, False],    # <-- a lone True value
       [False,  True, False],    # <--
       [ True,  True,  True],
       [False, False, False],
       [False, False,  True],    # <--
       [ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True]], dtype=bool)

So data[labels == row] returns some values not associated with a complete row-match:

In [141]: data[labels == row]
Out[141]: array([0, 8, 2, 3, 0, 0, 4, 9, 0, 0, 9, 5, 5, 0, 7])
                          ^  ^           ^
                          |  |           |
                          not related to a complete row match

Upvotes: 2

Related Questions