ralzaul
ralzaul

Reputation: 4470

How iterating over a std::set returns sorted results

The container std::set (or std::map) is a data structure that STL provides. In almost all compilers it is implemented as a R&B tree with guaranteed log(n) insertion, find and removal time.

https://en.wikipedia.org/wiki/Red%E2%80%93black_tree

In a red and black tree the elements are sorted based on the "less" operator of the stored element. So basically if a root is N + 1 , N will be on the left sub-tree while N + 2 will be on the right sub-tree and this ordering will be decided by the less operator.

My question is when executing the below code:

 set<unsigned long>::iterator it;
 for (it = myset.begin(); it != myset.end(); it++) {
     cout << *it;
 }

the elements are returned in a sorted order. How can this be possible considering the fact that the underlying data structure is a red and black tree? Is there a separate linked list stored to be able to iterate from the leftmost sub-tree to rightmost sub-tree? If not what is the mechanism behind this implementation using a R&B tree?

Upvotes: 9

Views: 983

Answers (2)

Klitos Kyriacou
Klitos Kyriacou

Reputation: 11621

The iterator does an [in-order depth-first tree traversal].1 This is typically implemented in a recursive algorithm. Since the iterator usage can't be implemented recursively, the iterator internally keeps a stack of where it's been so that it can go back up the tree.

Update: thanks to Chris Dodd for pointing out that RB tree nodes have pointers to their parents, so the iterator can simply follow those up to the next element.

Upvotes: 2

Benno
Benno

Reputation: 5456

We can find a definitive answer by looking at the source code (libstdc++ 5.2.1 in this case). This is how a tree node looks like:

// <libstdc++>/include/bits/stl_tree.h
struct _Rb_tree_node_base {
  typedef _Rb_tree_node_base* _Base_ptr;

  _Rb_tree_color    _M_color;
  _Base_ptr     _M_parent;
  _Base_ptr     _M_left;
  _Base_ptr     _M_right;

  // ...
}

So every node contains a color, and pointers to its parent and its left and right children. Incrementing is implemented as:

//  <libstdc++>/include/bits/stl_tree.h
struct _Rb_tree_iterator {
  _Self& operator++() {
    _M_node = _Rb_tree_increment(_M_node);
    return *this;
  }

// ...

private:
    _Base_ptr _M_node;
};

The actual increment is not in the public headers anymore, but in the compiled part of the library:

// <libstdc++>/src/c++98/tree.cc
static _Rb_tree_node_base* local_Rb_tree_increment(_Rb_tree_node_base* __x) throw ()
{
    if (__x->_M_right != 0) {
        __x = __x->_M_right;
        while (__x->_M_left != 0)
            __x = __x->_M_left;
     } else {
         _Rb_tree_node_base* __y = __x->_M_parent;
         while (__x == __y->_M_right)  {
             __x = __y;
             __y = __y->_M_parent;
         }
         if (__x->_M_right != __y)
             __x = __y;
     }
     return __x;
 }

So, ultimately, it's a textbook implementation of tree traversal: The iterator holds a pointer to the "current" node, and to get to the next node it goes up in the tree as long as it was coming from the right child. If it was coming from the left child, it will descend to leftmost child node of the right child.

Upvotes: 9

Related Questions