Cricketer
Cricketer

Reputation: 2421

C++ STL set implementation

Why is the C++ set implemented as a binary tree instead of as a hashset, which can provide average case complexity of O(1) as compared to the O(log n) provided by a binary tree?

Upvotes: 4

Views: 8143

Answers (3)

Tony Delroy
Tony Delroy

Reputation: 106096

There are two factors here:

  • both binary tree and hash based associative sets are useful
  • the original STL only implemented the former due to a lack of time to work through the "politics" of Standarisation; SGI, GNU, MS, boost and others have provided non-Standard hash-based versions for many years, and C++11 introduces unordered_set

These points are discussed below.

BOTH ARE USEFUL

There are lots of pros/cons, and both are available (as of C++11) to let the programmer choose. Hash-based sets were not available because there simply wasn't agreement on the implementation in time for them to be included in earlier standards.

  • set iteration is in sort order, whereas a hashed container unsorted_set is effectively randomised-order traversal
    • set supports lower_bound, upper_bound, which aren't practical to provide on an unsorted_set
  • set uses a comparison function (by default the contained type's operator<, whereas unordered_set requires a hashing function (defaults are provided for some types but they can be pretty mediocre depending on your actual keys, and quality hashing can be time consuming) and a key equality function
  • either may be faster for smaller values of N - not everyone's dealing with billions of elements so it's sensible to offer choice even from a performance perspective
  • there may be significant differences in memory usage for small objects, though I'm not sure what overall guidelines would be true across implementations so can just suggest measuring in your program if you care
  • as elements are added, existing iterators into a std::unordered_set may be invalidated (see 23.2.5p14 for when exactly), whereas iterators into a std::map are never invalidated by inserts (pointers and references to unordered_set elements remain valid though).

Why earlier C++ Standards didn't have a hash based set

From an interview with Stepanov:

Question: I find two hash table implementations in the D.Musser site, and they were both working and quite smart - much smarter than hash tables commonly found in class libraries. Why were hash tables not included into STL?

Answer: Politics. They have to be in. Our new implementation of STL does contain them. In general, we need to develop a mechanism for adding things to STL. After all, STL is an extensible framework, it begs to be extended. There are many data structures that are missing, for example, singly linked lists, matrices, and graphs. SGI is willing to lead the way in extending STL.

(full interview at http://www.stlport.org/resources/StepanovUSA.html)

Upvotes: 1

rici
rici

Reputation: 241721

According to John Nagle, from a 2006 posting to comp.lang.c++.moderated:

The actual reason is that the person who was writing the hash table part of the spec didn't finish it in time. That's all.

Standardization processes are like that.

Upvotes: 10

Matt Phillips
Matt Phillips

Reputation: 9691

Because C++ sets are ordered by the T's comparison operator, which makes it possible to iterate over the members in a predictable way. If you know that all you will do with the set is insert, test membership, and/or remove elements, then std::unordered_set, which implements a hashset, exists for that since C++11.

Upvotes: 17

Related Questions