Reputation: 7089
suppose you are given the following problem. You have two index sets that have a one-to-one mapping. For simplicity, let say, you have an array like int a [] = {21, 30, 45, 78}
this list maps {1, 2, 3, 4} to {21, 30, 45, 78}. What is the most efficient way to obtain the reverse mapping, i.e. given index 30
you'd want the algorithm to return 2
for 45
, you'd want 3
and so on. I can think of the following:
A binary search for the index. This is memory efficient and has complexity O(log n)
.
Have an array that has 79
elements and have reverseMap[21] = 1, reverseMap[30] = 2, reverseMap[45] = 3, reverseMap[78] = 4
. This is O(1)
and thus faster but is not memory efficient.
For my application both memory and speed are important. I'm short of memory since this is a number crunching code and thus will work with hundreds of millions of points. Speed is also important since the algorithm will be called many many times.
I sense Hash tables are useful here but I don't know much about it to comment. I'd appreciate any insight on the problem. Also, since the coding is done in c++
I'd like to see methods that utilize STL
and not external libraries.
Upvotes: 1
Views: 907
Reputation: 66961
As always: PROFILE. We can guess, but without running your code, we might be wrong. I made a rough benchmark on ideone (times are based on my computer). I did one hundred thousand lookups of unsigned int
in an array with ten million elements (I got bored waiting for your "hundreds of millions"), and these were my results:
unsorted vector found 1633382974 in 2140 ticks.
sorted vector found 1633382974 in 62 ticks.
unordered_map found 1633382974 in 16 ticks.
std::map found 1633382974 in 172 ticks. //that's half the time of a blink
However I have to note, maintaining these in the memory of your program will have some overhead over the unsorted vector. If we add the creation time to the timings of the hundred thousand lookups, we get:
unsorted vector found 1633382974 in 2141 ticks.
sorted vector found 1633382974 in 1797 ticks.
unordered_map found 1633382974 in 16218 ticks.
std::map found 1633382974 in 30749 ticks. //a full thirty seconds
So, as you can see, the timings depend entirely on what you do in your code. Try different things, time them with optimizations on, and go with the fastest for your code.
Upvotes: 2
Reputation: 26429
What is the most efficient way to obtain the reverse mapping
Dual std::map<value, value>
. or std::unordered_map
I.e. any map class, dual.
I.e. first map maps values from arrayA to arrayB, and second one maps values from arrayB to arrayA. Or first map maps index to value and second maps value to index.
You can do same thing using std::lower_bound
(binary search) and two std::vector<std::pair<value, value> >
, but you'll need to ensure that all data is sorted. It'll use less memory than two std::map
s, BUT you most likely will spend more time ensuring data is sorted.
For my application both memory and speed are important
hundreds of millions of points
Switch to 64bit, buy extra memory. Or store sorted data on disk (allows binary search of partially loaded data) and forget about speed, or try to process it somehow using "read from stdin, immediately write to stdout" fashion. Please note that hardware is cheaper than development time. Without knowing type of your data it won't be possible to recommend anything else.
Upvotes: 0