Reputation: 1006
My question is similar to testing whether a Numpy array contains a given row but instead I need a non-trivial extension to the method offered in the linked question; the linked question is asking how to check if each row in an array is the same as a single other row. The point of this question is to do that for numerous rows, one does not obviously follow from the other.
Say I have an array:
array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])
I want to know if each row of this array is in a secondary array given by:
check_array = np.array([[1, 2, 4], [1, 2, 1]])
Ideally this would look something like this:
is_in_check = array in check_array
Where is_in_check looks like this:
is_in_check = np.array([True, False, False, True])
I realise for very small arrays it would be easier to use a list comprehension or something similar, but the process has to be performant with arrays on the order of 106 rows.
I have seen that for checking for a single row the correct method is:
is_in_check_single = any((array[:]==[1, 2, 1]).all(1))
But ideally I'd like to generalise this over multiple rows so that the process is vectorized.
In practice, I would expect to see the following dimensions for each array:
array.shape = (1000000, 3)
check_array.shape = (5, 3)
Upvotes: 4
Views: 1640
Reputation: 35626
Broadcasting is an option:
import numpy as np
array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])
check_array = np.array([[1, 2, 4], [1, 2, 1]])
is_in_check = (check_array[:, None] == array).all(axis=2).any(axis=0)
Produces:
[ True False False True]
Broadcasting the other way:
is_in_check = (check_array == array[:, None]).all(axis=2).any(axis=1)
Also Produces
[ True False False True]
Upvotes: 7