Eric Truett
Eric Truett

Reputation: 3010

numpy test if each value in row in row of another array

I have two arrays of the same shape (2500, 9). I am trying to figure out the most efficient way to test if each value in a row in array1 is in the corresponding row in array2. Consider the following simplified example:

>>> array1 = np.array([[1, 2, 3], [4, 5, 6], [3, 8, 9]])
>>> array2 = np.array([[0, 2, 4], [3, 5, 6], [6, 8, 9]])
>>> comparison_func(array1, array2)
array([False, True, False], 
      [False, True, True], 
      [False, True, True])

I can accomplish this by iterating over each value in array1 and test if that value is in the corresponding row in array2.

>>> comp = []
>>> rows, columns = array1.shape
>>> np.array([array1[row, column] in array2[row, :]
              for row in range(rows)
              for column in range(columns)])
      .reshape(array1.shape)
array([[False,  True, False],
       [False,  True,  True],
       [False,  True,  True]])

I wanted to know if there is a more efficient way to do this in numpy. I tried various combinations of np.isin and np.isin1d, but could not limit the comparison to a scalar from array1 to the corresponding row in array2. Thanks in advance for any suggestions.

Upvotes: 1

Views: 873

Answers (2)

Ehsan
Ehsan

Reputation: 12397

Use numpy broadcasting and adding the new axis to properly compare each element of array1 with all elements of corresponding row in array2 (I assume you do not care about the position of your element in that row):

(array1[...,None]==array2[:,None,:]).any(-1)

output:

[[False  True False]
 [False  True  True]
 [False  True  True]]

Comparison:

#@U11-Forward's solution
def m1(array1, array2):
  return [[array2[i][x] == y for x, y in enumerate(v)] for i, v in enumerate(array1)]

#@Ehsan's solution
def m2(array1, array2):
  return (array1[...,None]==array2[:,None,:]).any(-1)


in_ = {n:[np.random.randint(10,size=(n,100)), np.random.randint(10,size=(n,100))] for n in [10,100,1000,10000]}

output:

enter image description here

m2 seems to be faster for this input.


Explanation: None in indexing is an alias for np.newaxis. Where ever you insert None, Numpy creates an extra dimension in that position (a.k.a. new axis). array1[...,None] is the same as array1[:,:,None].

Now, the comparison array1[...,None]==array2[:,None,:] uses broadcasting of numpy to compare each element of array1 with each element of array2 in the same row. The output has an extra dimension. To check if each element of array1 is in corresponding row of array2, it is enough to see if it is equal to any elements of array2 in that row, hence any(-1). -1 in python refers to the last index (here the last axis which corresponds to array2's columns of the same row.

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71560

Numpy's == equal does it:

>>> array1 = np.array([[1, 2, 3], [4, 5, 6], [3, 8, 9]])
>>> array2 = np.array([[0, 2, 4], [3, 5, 6], [6, 8, 9]])
>>> print(array1 == array2)
[[False  True False]
 [False  True  True]
 [False  True  True]]
>>>

The way Numpy does it is that they change the __eq__ function to something else, that does the above.

Here is an example that how you would be able to make that happen in a python class:

class array:
    def __init__(self, lst):
        self.lst = lst
    def __eq__(self, other):
        self.final = [[other[i][x] == y for x, y in enumerate(v)] for i, v in enumerate(self.lst)]
        return self.final
print(array([[1, 2, 3], [4, 5, 6], [3, 8, 9]]) == [[0, 2, 4], [3, 5, 6], [6, 8, 9]])

Output:

[[False, True, False], [False, True, True], [False, True, True]]

Upvotes: 2

Related Questions