J.P. Le Cavalier
J.P. Le Cavalier

Reputation: 1345

Intersection of 2-d numpy arrays

I am looking for a way to get the intersection between two 2-dimensional numpy.array of shape (n_1, m) and (n_2, m). Note that n_1 and n_2 can differ but m is the same for both arrays. Here are two minimal examples with the expected results:

import numpy as np

array1a = np.array([[2], [2], [5], [1]])
array1b = np.array([[5], [2]])

array_intersect(array1a, array1b)
##  array([[2],
##         [5]])


array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])

array_intersect(array2a, array2b)
##  array([[2, 1],
##         [3, 3]])

If someone have a clue on how I should implement the array_intersect function, I would be very grateful!

Upvotes: 1

Views: 3644

Answers (6)

yazan sayed
yazan sayed

Reputation: 1139

arr1 = np.arange(20000).reshape(-1,2)
arr2 = arr1.copy()
np.random.shuffle(arr2)
print(len(arr1)) #10000
%%timeit
res= np.array([x
   for x in set(tuple(x) for x in arr1) & set(tuple(x) for x in arr2)
])
83.7 ms ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 0

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

The numpy-indexed package (disclaimer: I am its author) was created with the exact purpose of providing such functionality in an expressive and efficient manner:

import numpy_indexed as npi
npi.intersect(a, b)

Note that the implementation is fully vectorized; that is no loops over the arrays in python.

Upvotes: 0

aparpara
aparpara

Reputation: 2201

Construct a set of tuples from the first array and test each line of the second array. Or vice versa.

def array_intersect(a, b):
    s = {tuple(x) for x in a}
    return np.unique([x for x in b if tuple(x) in s], axis=0)

Upvotes: 0

subnivean
subnivean

Reputation: 1152

Here's a way to do without any loops or list comprehensions, assuming you have scipy installed (I haven't tested for speed):

In [31]: from scipy.spatial.distance import cdist

In [32]: np.unique(array1a[np.where(cdist(array1a, array1b) == 0)[0]], axis=0)
Out[32]: 
array([[2],
       [5]])

In [33]: np.unique(array2a[np.where(cdist(array2a, array2b) == 0)[0]], axis=0)
Out[33]: 
array([[2, 1],
       [3, 3]])

Upvotes: 0

cvanelteren
cvanelteren

Reputation: 1703

Another approach would be to harness the broadcasting feature

import numpy as np

array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])

test = array2a[:, None] == array2b
print(array2b[np.all(test.mean(0) > 0, axis = 1)]) # [[2 1]
                                                   # [3 3]]

but this is less readable imo. [edit]: or use the unique and set combination. In short, there are many options!

Upvotes: 0

cvanelteren
cvanelteren

Reputation: 1703

How about using sets?

import numpy as np

array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])


a = set((tuple(i) for i in array2a))
b = set((tuple(i) for i in array2b))

a.intersection(b) # {(2, 1), (3, 3)}

Upvotes: 2

Related Questions