Reputation: 33
I am looking for the fastest way to search in a sorted, fixed array of 32 bit keys. The array size and data is static and will never change. The size of this array is ~1000-10000 unique elements. The search range is significantly broader (~100000) so a lot of searched values will not be found. I am interested in exact matches only.
Here is how the search proceeds:
A potentially interesting property of the keys is that even if they are not close in term of integer value, most of them will only have a few different bits (~1-4) from their closest neighbor.
Most answers I found point towards binary search but none deal with the case of a static array, which probably opens up some optimization possibilities.
I have full control over the data structure, right now it is a fixed, sorted array but I could change that if it's not optimal. I could also add precomputed information since the data doesn't change if it doesn't take an unreasonable amount of memory.
The goal is to be efficient both in CPU and memory although CPU is the priority here.
Using C++ although that probably won't affect the answer much.
Upvotes: 3
Views: 464
Reputation: 3484
Considering that your static arrays never change, and that you have infinite pre-processing power I think the best approach would be to create a specific hash function for each of your arrays.
My approach - define a parameterized hash function (code in java):
private static Function<Long, Integer> createHashFunction(int sz) {
int mvLeft = ThreadLocalRandom.current().nextInt(30);
int mvRight = ThreadLocalRandom.current().nextInt(16);
int mvLeft2 = ThreadLocalRandom.current().nextInt(10);
int mvRight2 = ThreadLocalRandom.current().nextInt(16);
int mvLeft3 = ThreadLocalRandom.current().nextInt(16);
int mvRight3 = ThreadLocalRandom.current().nextInt(20);
return (key) -> {
// These operations are totally random, and has no mathematical background beneath them!
key = ~key + (key << mvLeft);
key = key ^ (key >>> mvRight);
key = key + (key << mvLeft2);
key = key ^ (key >>> mvRight2);
key = key + (key << mvLeft3);
key = key ^ (key >>> mvRight3);
return (int) (Math.abs(key) % sz); // sz is the size of target array
};
}
For each test array find such a combination of parameters, that max bucket size is the smallest.
Some testing (input array has the size of 10k, filled with random elements):
Considering that with the max bucket size of 2 it is possible to map both values into one 64-bit integer, this approach will result in only one memory jump and the simplest operations for CPU - hashing is made through xor, plus and shifts, which should be extremely fast as well as bits comparison.
However your data may not be so good, and may require bucket size of 3, which destroys possibility of long long
usage for bucket items. In this case you can try to find some decent hash function instead of the random mess I've written.
Upvotes: 3