noobprogrammer
noobprogrammer

Reputation: 81

Find the location(Indices) of N elements in a huge numpy array

I have a set of say, 5 elements,

[21,103,3,10,243]

and a huge Numpy array

[4,5,1,3,5,100,876,89,78......456,64,3,21,245]

with the 5 elements appearing repetitively in the bigger array. I want to find all the Indices where the elements of the small list appears in the larger array. The small list will be less than 100 elements long and the large list will be about 10^7 elements long, and so, speed is a concern here. What is the most elegant and the fastest way to do it in python3.x ?

I have tried using np.where() but it works dead slow. Looking for a faster way.

Upvotes: 1

Views: 824

Answers (4)

Abhiraj Agrawal
Abhiraj Agrawal

Reputation: 1

smaller_array = [21,103,3,10,243]

bigger_array = [4,5,1,3,5,100,876,89,78,456,64,3,21,243,243]

print(bigger_array)

print(smaller_array)

for val in smaller_array:

    if val in bigger_array:

        c=0

        try:
            while True:
                c = bigger_array.index(val,c)
                print(f'{val} is found in bigger_array at index {c}')
                c+=1
        except:
            pass

Upvotes: 0

Ashit Kumar Rai
Ashit Kumar Rai

Reputation: 1

smaller_array = [21,103,3,10,243] bigger_array = [4,5,1,3,5,100,876,89,78,456,64,3,21,243,243] print(bigger_array) print(smaller_array) for val in smaller_array: if val in bigger_array: c = bigger_array.index(val) while True: print(f'{val} is found in bigger_array at index {bigger_array.index(val,c)}') c = bigger_array.index(val,c)+1 if val not in bigger_array[c:]: break

Upvotes: 0

Moosa Saadat
Moosa Saadat

Reputation: 1167

To speed up things, you can optimize like this:

  1. Sort the larger array
  2. Perform binary search (on the larger array) for each number in the smaller array.

Time Complexity

Sorting using numpy.sort(kind='heapsort') will have time complexity n*log(n). Binary search will have complexity log(n) for each element in the smaller array. Assuming, there are m elements in the smaller array, the total search complexity will be m*log(n).

Overall, this will provide you good optimization.

Upvotes: 1

NotDijkstra
NotDijkstra

Reputation: 309

You can put the 100 elements to be found in a set, a hash table. Then loop through the elements of the huge array asking if the element is in the set.

S = set([21,103,3,10,243])
A = [4,5,1,3,5,100,876,89,78......456,64,3,21,245]
result = []
for i,x in enumerate(A):
  if x in S:
    result.append(i)

Upvotes: 1

Related Questions