frank
frank

Reputation: 379

Efficient implementation to check: for each element in array check if it exists in another array

I have two large 1d numpy arrays in the range of 400K elements. I need to check for each element in array A if it exists in array B. I used in1d but it seems to be too slow and takes a lot of time. I would like to know if there is any way to speed this up?

A = np.array([1,2,3,4,5,6,7]) 
B = np.array([3,4,7])
result = np.in1d(A, B, invert=True)
result
>> array([ True,  True, False, False,  True,  True, False]

Upvotes: 1

Views: 113

Answers (2)

koPytok
koPytok

Reputation: 3733

I prefer pandas for that task:

import pandas as pd

A, B = pd.DataFrame(A), pd.DataFrame(B)
A.merge(B, on=0, how="left", indicator=True)

>>> 0   _merge
0   1   left_only
1   2   left_only
2   3   both
3   4   both
4   5   left_only
5   6   left_only
6   7   both

Upvotes: 1

Roy Shahaf
Roy Shahaf

Reputation: 494

Try transforming B into a structure better fitted for search (hash set or sorted set)

Upvotes: 3

Related Questions