jxm
jxm

Reputation: 19

perl - searching a large /sorted/ array for index of a string

I have a large array of approx 100,000 items, and a small array of approx 1000 items. I need to search the large array for each of the strings in the small array, and I need the index of the string returned. (So I need to search the 100k array 1000 times)

The large array has been sorted so I guess some kind of binary chop type search would be a lot more efficient than using a foreach loop (using 'last' to break the loop when found) which is what I started with. (this first attempt results in some 30m comparisons!)

Is there a built in search method that would produce a more efficient result, or am I going to have to manually code a binary search? I also want to avoid using external modules.

For the purposes of the question, just assume that I need to find the index of a single string in the large sorted array. (I only mention the 1000 items to give an idea of the scale)

Upvotes: 0

Views: 165

Answers (2)

amon
amon

Reputation: 57590

Using a binary search is probably optimal here. Binary search only needs O(log n) comparisions (here ~ 17 comparisons per lookup).

Alternatively, you can create a hash table that maps items to their indices:

my %positions;
$positions{ $large_array[$_] } = $_ for 0 .. $#large_array;

for my $item (@small_array) {
  say "$item has position $positions{$item}";
}

While now each lookup is possible in O(1) without any comparisons, you do have to create the hash table first. This may or may not be faster. Note that hashes can only use strings for keys. If your items are complex objects with their own concept of equality, you will have to derive a suitable key first.

Upvotes: 2

mpapec
mpapec

Reputation: 50637

This sounds like classic hash use case scenario,

my %index_for = map { $large_array[$_] => $_ } 0 .. $#large_array;

print "index in large array:", $index_for{ $small_array[1000] };

Upvotes: 4

Related Questions