Connor
Connor

Reputation: 1006

How do you Check if each Row of a Numpy Array is Contained in a Secondary Array?

My question is similar to testing whether a Numpy array contains a given row but instead I need a non-trivial extension to the method offered in the linked question; the linked question is asking how to check if each row in an array is the same as a single other row. The point of this question is to do that for numerous rows, one does not obviously follow from the other.

Say I have an array:

array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])

I want to know if each row of this array is in a secondary array given by:

check_array = np.array([[1, 2, 4], [1, 2, 1]])

Ideally this would look something like this:

is_in_check = array in check_array

Where is_in_check looks like this:

is_in_check = np.array([True, False, False, True])

I realise for very small arrays it would be easier to use a list comprehension or something similar, but the process has to be performant with arrays on the order of 106 rows.

I have seen that for checking for a single row the correct method is:

is_in_check_single = any((array[:]==[1, 2, 1]).all(1))

But ideally I'd like to generalise this over multiple rows so that the process is vectorized.

In practice, I would expect to see the following dimensions for each array:

array.shape = (1000000, 3)
check_array.shape = (5, 3)

Upvotes: 4

Views: 1640

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Broadcasting is an option:

import numpy as np

array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])

check_array = np.array([[1, 2, 4], [1, 2, 1]])
is_in_check = (check_array[:, None] == array).all(axis=2).any(axis=0)

Produces:

[ True False False  True]

Broadcasting the other way:

is_in_check = (check_array == array[:, None]).all(axis=2).any(axis=1)

Also Produces

[ True False False  True]

Upvotes: 7

Related Questions