Reputation: 379
I have two large 1d numpy arrays in the range of 400K elements. I need to check for each element in array A if it exists in array B. I used in1d
but it seems to be too slow and takes a lot of time. I would like to know if there is any way to speed this up?
A = np.array([1,2,3,4,5,6,7])
B = np.array([3,4,7])
result = np.in1d(A, B, invert=True)
result
>> array([ True, True, False, False, True, True, False]
Upvotes: 1
Views: 113
Reputation: 3733
I prefer pandas
for that task:
import pandas as pd
A, B = pd.DataFrame(A), pd.DataFrame(B)
A.merge(B, on=0, how="left", indicator=True)
>>> 0 _merge
0 1 left_only
1 2 left_only
2 3 both
3 4 both
4 5 left_only
5 6 left_only
6 7 both
Upvotes: 1
Reputation: 494
Try transforming B into a structure better fitted for search (hash set or sorted set)
Upvotes: 3