Reputation: 175
I have a pair of 2D arrays of the same dimensions, (n, 3). I want to select from the first based on indexing with the second. My idea would be the following:
data[labels == row]
where row
is a vector of length 3.
The inner boolean comparison gives an array of shape (n, 3). The indexing gives a flat 1d array.
My problem is then, I have to either reshape the array manually, or use something like np.all
on the array labels == row
.
This actually works correctly though if data
is a pandas DataFrame
.
What's the proper way to do this with pure ndarray
s?
Upvotes: 1
Views: 100
Reputation: 879163
Use (labels == row).all(axis=1)
to select rows where all the values match:
import numpy as np
np.random.seed(2016)
labels = np.random.randint(10, size=(10, 3))
data = np.random.randint(10, size=(10, 3))
# array([[0, 8, 2],
# [3, 2, 2],
# [4, 0, 9],
# [0, 4, 9],
# [5, 5, 1],
# [7, 8, 0],
# [0, 9, 5],
# [0, 6, 2],
# [0, 0, 5],
# [5, 0, 7]])
row = labels[::3] = labels[0]
data[(labels == row).all(axis=1)]
yields
array([[0, 8, 2],
[0, 4, 9],
[0, 9, 5],
[5, 0, 7]])
Notice that the boolean array labels == row
has some True values
on rows which are not complete matches:
In [138]: labels == row
Out[138]:
array([[ True, True, True],
[ True, False, False], # <-- a lone True value
[False, True, False], # <--
[ True, True, True],
[False, False, False],
[False, False, True], # <--
[ True, True, True],
[False, False, False],
[False, False, False],
[ True, True, True]], dtype=bool)
So data[labels == row]
returns some values not associated with a complete row-match:
In [141]: data[labels == row]
Out[141]: array([0, 8, 2, 3, 0, 0, 4, 9, 0, 0, 9, 5, 5, 0, 7])
^ ^ ^
| | |
not related to a complete row match
Upvotes: 2