Reputation: 2421
Why is the C++ set implemented as a binary tree instead of as a hashset, which can provide average case complexity of O(1) as compared to the O(log n) provided by a binary tree?
Upvotes: 4
Views: 8143
Reputation: 106096
There are two factors here:
unordered_set
These points are discussed below.
There are lots of pros/cons, and both are available (as of C++11) to let the programmer choose. Hash-based sets were not available because there simply wasn't agreement on the implementation in time for them to be included in earlier standards.
set
iteration is in sort order, whereas a hashed container unsorted_set
is effectively randomised-order traversal
set
supports lower_bound
, upper_bound
, which aren't practical to provide on an unsorted_set
set
uses a comparison function (by default the contained type's operator<
, whereas unordered_set
requires a hashing function (defaults are provided for some types but they can be pretty mediocre depending on your actual keys, and quality hashing can be time consuming) and a key equality functionstd::unordered_set
may be invalidated (see 23.2.5p14 for when exactly), whereas iterators into a std::map
are never invalidated by inserts (pointers and references to unordered_set
elements remain valid though).From an interview with Stepanov:
Question: I find two hash table implementations in the D.Musser site, and they were both working and quite smart - much smarter than hash tables commonly found in class libraries. Why were hash tables not included into STL?
Answer: Politics. They have to be in. Our new implementation of STL does contain them. In general, we need to develop a mechanism for adding things to STL. After all, STL is an extensible framework, it begs to be extended. There are many data structures that are missing, for example, singly linked lists, matrices, and graphs. SGI is willing to lead the way in extending STL.
(full interview at http://www.stlport.org/resources/StepanovUSA.html)
Upvotes: 1
Reputation: 241721
According to John Nagle, from a 2006 posting to comp.lang.c++.moderated:
The actual reason is that the person who was writing the hash table part of the spec didn't finish it in time. That's all.
Standardization processes are like that.
Upvotes: 10
Reputation: 9691
Because C++ sets are ordered by the T
's comparison operator, which makes it possible to iterate over the members in a predictable way. If you know that all you will do with the set is insert, test membership, and/or remove elements, then std::unordered_set
, which implements a hashset, exists for that since C++11.
Upvotes: 17