Reputation: 429
I have 4 arrays. Array X: is 2D array that contain examples (each has 3 features):
X = array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
Array Y contains labels for examples in Array X:
Y = array([11, 44, 77, 22, 77, 22, 22])
Arrays L & R contain subsets of the labels
L = array([11, 44])
R = array([77, 22])
I want to slice both X and Y according to the labels in L and R. So the output should be:
XL = array([[1, 2, 3], [4, 5, 6]])
XR = array([[7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
YL = array([11, 44])
YR = array([77, 22, 77, 22, 22])
I know I can do something like the following to extract the rows I want when based on value:
Y[Y==i]
X[Y[Y==i], :]
However, i
here is a value, but in my question it is another array (e.g., L
and R
).
I want an efficient solution in python 3 to do that. Any hints?
Upvotes: 0
Views: 74
Reputation: 2656
Using np.isin
:
import numpy as np
X = np.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
Y = np.asarray([11, 44, 77, 22, 77, 22, 22])
L = np.asarray([11, 44])
R = np.asarray([77, 22])
mask_L = np.isin(Y, L)
mask_R = np.isin(Y, R)
print(X[mask_L,:]) # output: array([[1, 2, 3], [4, 5, 6]])
print(X[mask_R,:]) # output: array([[ 7, 8, 9], [10, 11, 12], 13, 14, 15], 16, 17, 18], 19, 20, 21]])
print(Y[mask_L]) # output: array([11, 44])
print(Y[mask_R]) # output: array([77, 22, 77, 22, 22])
Upvotes: 1
Reputation: 379
That's how you normally do:
from numpy import array
X = array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
Y = array([11, 44, 77, 22, 77, 22, 22])
L = array([11, 44])
R = array([77, 22])
XL = array([x for x, y in zip(X, Y) if y in L])
XR = array([x for x, y in zip(X, Y) if y in R])
YL = array([y for y in Y if y in L])
YR = array([y for y in Y if y in R])
# Output
# XL = array([[1, 2, 3], [4, 5, 6]])
# XR = array([[7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
# YL = array([11, 44])
# YR = array([77, 22, 77, 22, 22])
Hope this helps :)
Upvotes: 1