Zotta
Zotta

Reputation: 2603

c++ map implementation with good performance

I want to map some keys to other data structures, where the list of keys is received from the internet.

std::unordered_map is O(1), but worst case O(N)
std::map is always O(log N)

Is there another map implementation with O(1) best case and O(log N) worst case performance?

Upvotes: 0

Views: 200

Answers (3)

rmi
rmi

Reputation: 532

How about this, just a suggestion.

Sudo code:

my_hash = GenHash(key)

std::unordered_map<my_hash, val, Hash = my_hash> map1   <---- Hash function of unordered_map should return its key. i.e. my_hash
std::map<key, val> map2

if my_hash is in map1
    map2[key] = val
else
    val.k = key    <---- assumes key can be stored/found inside the value
    map1[my_hash] = val

This way we can stop forming linked lists inside unordered_map that causes O(N). At the best case you fill only the map1. Let me know if the sudo code is not clear.

Upvotes: 0

Mikael Persson
Mikael Persson

Reputation: 18562

The best and worst performance of the std::unordered_map is very much dependent on the quality of the hash function. The worst O(N) performance is when all your keys map to the same hash-value (i.e., 100% collision-rate). So, unless you have an extremely terrible hash function, you won't really get that kind of worst-case performance.

When it comes to determining the performance of a hash-map, it very much has to do with probabilities and statistics about your data. Basically, you have a distribution of your input data (keys), which is then mapped to a distribution of hash-values. If you know your input distribution well, you can design a good hash function that will map to a uniform distribution of hash-values. If your hash-values' distribution is very uniform, then the probability of collisions is low, and thus, the sizes of your buckets (groups of values with the same hash-value) will be small on average, and that will lead to very good average-case performance. You could say that the average-case performance is O(B), where B is the average size of the buckets. The better your hash function is, the lower the collision probability, the lower your bucket sizes, the better your average performance is, and that's what you should aim for.

You cannot, in general, provide a guarantee that you won't get the worst performance O(N), but you can guarantee that the probability of ever striking such a bad case will be very low.

That said, there could be a data structure which stores the elements of each buckets in a way that makes their look-up faster, such as a binary tree or a sorted array. I don't know of any particular container available that does this, but it would reduce the worst case to about O(log(N)), but it would also be an additional burden (constant factor). So, at the end of the day, you'd have to test it to find out for sure.

Upvotes: 1

Mark B
Mark B

Reputation: 96243

As you noted, the worst case for std::unordered_map is linear, so instead of asking for a better worst case (no such container exists - if it did wouldn't the standard use it, or at least supply such a variation?) let's instead consider what causes the worst case and see if we can prevent it.

In this case the std::unordered_map is (almost certainly) a hashmap, so the worst case happens when every item you insert hashes to the sme value and they all chain into a single bucket (effectively making the hashmap a linked list).

So as long as you have even a remotely reasonable hash function the worst case will never happen and you'll wind up with a constant time operation.

Upvotes: 3

Related Questions