Reputation: 69672
I've read somewhere that std::map is, with current compilers, still the most efficient associative container we have in the STL, even with std::unsorted_map that --from what I read somewhere, I'm not sure where-- becomes more efficient on find() only if there is a lot of entries, like more than 40k.
So now I'm not really sure anymore because I always assumed that a hash map is more efficient at least in case of string keys.
So to be short:
If I have to choose an associative container with unknown entry count and with std::string as keys, what would be (at least in theory) the more efficient (on speed) choice for finding?
Upvotes: 2
Views: 1819
Reputation: 2624
When you have 40k entries or more, strings (or lists of elements, etc.) should not be used as associative keys in the standard containers. Instead, there comes a point much earlier where a trie or a ternary tree become better options. Both of those can build associative structures that only compare each character of your string (or element of your list, etc.) once. Ordered maps compare at every node (and so are O(m log n) - m size of string, n number of elements), and the unordered maps suffer from far more collisions at those sizes.
A ternary tree (each child branches to characters less, equal, or greater on a single char compare) takes the least memory of the better implementations, but tries are by far the fastest. Both of these may be built from boost.graph or some other generic graph library.
Upvotes: 1
Reputation: 476970
Profile, profile, profile...
The problem with strings as keys is that comparing them is very slow (think difference in the last character of a 1000-character string). The advantage of an unordered_map
with a string key comes at least in part from the fact that only the fixed-width hash values have to be compared, so in practice the unordered map may well be a lot faster.
The hash implementation may choose, for example, to use only a fixed number of spread-out digits to compute the hash value and thus end up putting some near-identical strings in the same bucket, so it's a trade-off. You can probably concoct a set of key values for which both containers would perform very poorly, but for a "random" or "typical" collection of strings, my bet is on the hash container.
Upvotes: 10