Reputation: 81
I have a set of say, 5 elements,
[21,103,3,10,243]
and a huge Numpy array
[4,5,1,3,5,100,876,89,78......456,64,3,21,245]
with the 5 elements appearing repetitively in the bigger array.
I want to find all the Indices where the elements of the small list appears in the larger array.
The small list will be less than 100
elements long and the large list will be about 10^7
elements long, and so, speed is a concern here. What is the most elegant and the fastest way to do it in python3.x ?
I have tried using np.where()
but it works dead slow. Looking for a faster way.
Upvotes: 1
Views: 824
Reputation: 1
smaller_array = [21,103,3,10,243]
bigger_array = [4,5,1,3,5,100,876,89,78,456,64,3,21,243,243]
print(bigger_array)
print(smaller_array)
for val in smaller_array:
if val in bigger_array:
c=0
try:
while True:
c = bigger_array.index(val,c)
print(f'{val} is found in bigger_array at index {c}')
c+=1
except:
pass
Upvotes: 0
Reputation: 1
smaller_array = [21,103,3,10,243] bigger_array = [4,5,1,3,5,100,876,89,78,456,64,3,21,243,243] print(bigger_array) print(smaller_array) for val in smaller_array: if val in bigger_array: c = bigger_array.index(val) while True: print(f'{val} is found in bigger_array at index {bigger_array.index(val,c)}') c = bigger_array.index(val,c)+1 if val not in bigger_array[c:]: break
Upvotes: 0
Reputation: 1167
To speed up things, you can optimize like this:
Sorting using numpy.sort(kind='heapsort')
will have time complexity n*log(n)
.
Binary search will have complexity log(n)
for each element in the smaller array. Assuming, there are m
elements in the smaller array, the total search complexity will be m*log(n)
.
Overall, this will provide you good optimization.
Upvotes: 1
Reputation: 309
You can put the 100 elements to be found in a set
, a hash table.
Then loop through the elements of the huge array asking if the element is in the set.
S = set([21,103,3,10,243])
A = [4,5,1,3,5,100,876,89,78......456,64,3,21,245]
result = []
for i,x in enumerate(A):
if x in S:
result.append(i)
Upvotes: 1