Reputation: 4282
I have a std::map for some packet processing program.
I didn't noticed before profiling but unfortunately this map lookup alone consume about 10% CPU time (called too many time).
Usually there only exist at most 10 keys in the input data. So I'm trying to implement a kind of key cache in front of the map.
Key value is 13 bit integer. I know there are only 8192 possible keys and array of 8192 items can give constant time lookup but I feel already ashamed and don't want use such a naive approach :(
Now, I'm just guessing some method of hashing that yield 4 bit code value for 13 bit integer very fast.
Any cool idea?
Thanks in advance.
UPDATE
Beside my shame, I don't have total control over source code and it's almost prohibited to make new array for this purpose.
Project manager said (who ran the profiler) linked list show small performance gain and recommended using std::list instead of std::map.
UPDATE
Value of keys are random (no relationship) and doesn't have good distribution.
Sample:
1) 0x100, 0x101, 0x10, 0x0, 0xffe
2) 0x400, 0x401, 0x402, 0x403, 0x404, 0x405, 0xff
Upvotes: 4
Views: 609
Reputation: 16761
You may want to go with middle solution and open addressing technique: one array of size 256. Index to an array is some simple hash function like XOR of two bytes. Element of the array is struct {key, value}. Collisions are handled by storing collided element at the next available index. If you need to delete element from array, and if deletion is rare then just recreate array (create a temporary list from remaining elements, and then create array from this list).
If you pick your hash function smartly there would not be any collisions almost ever. For instance, from your two examples one such hash would be to XOR low nibble of high byte with high nibble of low byte (and do what you like with remaining 13-th bit).
Upvotes: 2
Reputation: 9039
If there are really only going to be something like 10 active entries in your table, you might seriously consider using an unsorted vector to hold this mapping. Something like this:
typedef int key_type;
typedef int value_type;
std::vector<std::pair<key_type, value_type> > mapping;
inline void put(key_type key, value_type value) {
for (size_t i=0; i<mapping.size(); ++i) {
if (mapping[i].first==key) {
mapping[i].second=value;
return;
}
}
mapping.push_back(std::make_pair(key, value));
}
inline value_type get(key_type key) {
for (size_t i=0; i<mapping.size(); ++i) {
if (mapping[i].first==key) {
return mapping[i].second;
}
}
// do something reasonable if not found?
return value_type();
}
Now, the asymptotic speed of these algorithms (each O(n)
) is much worse than you'd have with either a red-black tree (like std::map
at O(log n)
) or hash table (O(1)
). But you're not talking about dealing with a large number of objects, so asymptotic estimates don't really buy you much.
Additionally, std::vector
buys you both low overhead and locality of reference, which neither std::map
nor std::list
can offer. So it's more likely that a small std::vector
will stay entirely within the L1 cache. As it's almost certainly the memory bottleneck that's causing your performance issues, using a std::vector
with even my poor choice of algorithm will likely be faster than either a tree or linked list. Of course, only a few solid profiles will tell you for sure.
There are certainly algorithms that might be better choices: a sorted vector could potentially give even better performance; a well tuned small hash table might work as well. I suspect that you'll run into Amdahl's law pretty quickly trying to improve on a simple unsorted vector, however. Pretty soon you might find yourself running into function call overhead, or some other such concern, as a large contributor to your profile.
Upvotes: 2
Reputation: 30035
I would use a lookup table. It's tiny unless you are using a micrcontroller or something.
Otherwise I would do this -
Generate a table of say 30 elements. For each lookup calculate a hash value of (key % 30) and compare it with the stored key in that location in the table. If the key is there then you found your value. if the slot is empty, then add it. If the key is wrong then skip to the next free cell and repeat.
With 30 cells and 10 keys collisions should be rare but if you get one it's fast to skip to the next cell, and normal lookups are simply a modulus and a compare operation so fairly fast
Upvotes: 0
Reputation: 4040
You might first want to see if there are any unnecessary calls to the key lookup. You only want to do this once per packet ideally -- each time you call a function there is going to be some overhead, so getting rid of extra calls is good.
Map is generally pretty fast, but if there is any exploitable pattern in the way keys are mapped to items you could use that and potentially do better. Could you provide a bit more information about the keys and the associated 4-bit values? E.g. are they sequential, is there some sort of pattern?
Finally, as others have mentioned, a lookup table is very fast, 8192 values * 4 bits is only 4kb, a tiny amount of memory indeed.
Upvotes: 0
Reputation: 490148
Unless you're writing for some sort of embedded system where 8K is really significant, just use the array and move on. If you really insist on doing something else, you might consider a perfect hash generator (e.g., gperf).
Upvotes: 2
Reputation: 14159
When you always just have about ten keys, use a list (or array). Do some benchmarking to find out whether or not using a sorted list (or array) and binary search will improve performance.
Upvotes: 1
Reputation: 211
I agree with GWW, you don't use so much memory in the end... But if you want, you could use an array of 11 or 13 linkedlists, and hash the keys with the % function. If the key number is less than the array size, complexity tents still to be O(1).
Upvotes: 1
Reputation: 44093
Assuming your hash table either contains some basic type -- it's almost no memory at all. Even on 64-bit systems it's only 64kb of memory. There is no shame in using a lookup table like that, it has some of the best performance you can get.
Upvotes: 3