Efficient algorithm for index mapping

Question

suppose you are given the following problem. You have two index sets that have a one-to-one mapping. For simplicity, let say, you have an array like int a [] = {21, 30, 45, 78} this list maps {1, 2, 3, 4} to {21, 30, 45, 78}. What is the most efficient way to obtain the reverse mapping, i.e. given index 30 you'd want the algorithm to return 2 for 45, you'd want 3 and so on. I can think of the following:

A binary search for the index. This is memory efficient and has complexity O(log n).
Have an array that has 79 elements and have reverseMap[21] = 1, reverseMap[30] = 2, reverseMap[45] = 3, reverseMap[78] = 4. This is O(1) and thus faster but is not memory efficient.

For my application both memory and speed are important. I'm short of memory since this is a number crunching code and thus will work with hundreds of millions of points. Speed is also important since the algorithm will be called many many times.

I sense Hash tables are useful here but I don't know much about it to comment. I'd appreciate any insight on the problem. Also, since the coding is done in c++ I'd like to see methods that utilize STL and not external libraries.

Mooing Duck · Accepted Answer

As always: PROFILE. We can guess, but without running your code, we might be wrong. I made a rough benchmark on ideone (times are based on my computer). I did one hundred thousand lookups of unsigned int in an array with ten million elements (I got bored waiting for your "hundreds of millions"), and these were my results:

unsorted vector  found 1633382974 in 2140 ticks.
sorted vector  found 1633382974 in 62 ticks.
unordered_map  found 1633382974 in 16 ticks.
std::map  found 1633382974 in 172 ticks. //that's half the time of a blink

However I have to note, maintaining these in the memory of your program will have some overhead over the unsorted vector. If we add the creation time to the timings of the hundred thousand lookups, we get:

unsorted vector  found 1633382974 in 2141 ticks.
sorted vector  found 1633382974 in 1797 ticks.
unordered_map  found 1633382974 in 16218 ticks.
std::map  found 1633382974 in 30749 ticks. //a full thirty seconds

So, as you can see, the timings depend entirely on what you do in your code. Try different things, time them with optimizations on, and go with the fastest for your code.

Efficient algorithm for index mapping

Answers (2)

Related Questions