Reputation: 191

Why Aren't More Iterators Random Access?

I'm trying to learn more about the STL iterators in C++. I understand how different data structures have different iterators but I don't understand why some iterators are not RandomAccess. For instance, why is the LinkedList iterator not a random access iterator? I understand that a LinkedList is not a "random access structure" perse, but couldn't we implement the iterator to give the illusion of a random access structure? For instance, the LinkedList has a Bidirectional Iterator which does not define the + or += operator but it defines the ++ operator. Couldn't we just define the + and += operators using something like:

iterator operator+= (int steps) {
     for (int i = 0; i < steps; ++i) {
          this->operator++();
     }
}

After looking at the RandomAccessIterator requirements, I think we could probably implement most if not all of those functions for the LinkedList so why don't we? I'm guessing its because some operations would have essentially O(N) time complexity, but I'm not sure if that is the key concern. If we were to implement RandomAccessIterators using this approach, what consequences would this have on using the LinkedList with STL algorithms? Would we suddenly be able to sort a LinkedList using the std::sort function?

Upvotes: 4

Answers (3)

Yakk - Adam Nevraumont

Reputation: 275385

std::prev and std::next both let you advance a non-random access iterator using a single function call.

Instead of exposing an inefficient +, the designers of the C++ standard said "no, that would be bad to use".

std::sorting a random access iterator with an O(n) + would take O(n^2lgn) or worse. Quicksort is not an efficient way to sort forward iterators.

Meanwhile, a sort based off of std::merge or std::inplace_merge won't be nearly as inefficient on forward iterators. Simply advance over the range, and as you do store pointers to each power-of-two subrange in a tree structure.

As a pair of power of two subranges are found and sorted, std::inplace_merge them. This results in them being sorted.

Something like this:

template<class It>
struct range_t {
  It b = {};
  It e = {};
  It begin() const { return b; }
  It end() const { return e; }
  bool empty() const { return begin()==end(); }
};
template<class It>
struct merge_range_t:range_t<It> {
  std::size_t pow2 = 0;
};
template<class It>
void merge_sort( range_t<It> to_sort ) {
  std::vector< merge_range_t<It> > sorted;
  auto initial = to_sort;
  auto do_merge = [&]{
    auto a = sorted.back(); sorted.pop_back();
    auto b = sorted.back(); sorted.pop_back();
    std::inplace_merge( a.begin(), a.end(), b.end() );
    sorted.push_back( {{ a.begin(), b.end() }, a.pow+1} );
  };
  auto should_merge = [&]{
    if (sorted.size() < 2) return false;
    return sorted.back().pow2 == sorted[sorted.size()-2].pow2);
  };
  while (!to_sort.empty()) {
    // elements of size 1 are always sorted:
    sorted.push_back( { {to_sort.begin(), std::next(to_sort.begin())} } );
    // do merges of match size as required:
    while(should_merge())
      do_merge();
    // first element no longer needs sorting:
    to_sort = {std::next(to_sort.begin()), to_sort.end()};
  }
  // the remaining sorted regions are not matched in size, but we still
  // need to merge them:
  while (sorted.size() > 1)
    do_merge();
}

c++17 for brevity, but can be written in c++11 or even c++14 easily. Requires bidirectional iterators, and is O(n lg n) unless I made a mistake. Far larger constant factor than std::sort.

Upvotes: 2

Nicol Bolas

Reputation: 473322

Iterator categories are about more than what is possible; they're also about what is reasonable.

Any forward iterator can be advanced X number of times. But ForwardIterator does not include += for integers. This is important, because it allows code which is written against the RandomAccessIterator requirement to explicitly fail when provided with an iterator that doesn't explicitly provide this interface. And by doing so, such code can declare itself to have a particular performance characteristics.

For example std::sort is O(n log(n)). But it can only promise that because it requires random access iterators. You could in theory implement the same std::sort algorithm with any BidirectionalIterator, but your performance would be exceedingly bad for non-random access iterators. So bad that you should probably do something drastic with your code, rather than just accepting the performance penalty. Therefore, std::sort outright forbids it.

Or to put it another way, if someone told you to implement sort for a BidirectionalIterator, you would pick a very different algorithm than the one you would for a RandomAccessIterator.

Other algorithms are able to be more flexible. They may have a faster implementation with random access iterators, but they'd still be using the same general algorithm (in theory). This is why functions like std::advance exist; they allow mere Forward/BidirectionalIterators to have the same integer offsetting behavior as RandomAccessIterators. But you're using them with the full knowledge that it's going to be O(n) for non-RandomAccessIterators. For some algorithms, this performance difference is OK.

Upvotes: 3

Benjamin Lindley

Reputation: 103693

I'm guessing its because some operations would have essentially O(N) time complexity, but I'm not sure if that is the key concern.

Yes, that is precisely the key concern. Iterators are modeled after pointers. And with pointers, people have certain expectations. One of those expectations is that pointer addition and subtraction are very fast operations (specifically, O(1)). The designers of the standard library decided to honor those expectations. So if a standard library iterator cannot perform addition and subtraction in O(1), then it does not implement those operations, and is not classified as a random access iterator.

Note that with the increment and decrement operators (++ and --), the performance requirements are slightly relaxed, and there are some iterators which implement those in O(log n) instead of O(1). This compromise is necessary, because if you can't increment or decrement an iterator, it's not much use.

If we were to implement RandomAccessIterators using this approach, what consequences would this have on using the LinkedList with STL algorithms? Would we suddenly be able to sort a LinkedList using the std::sort function?

Yes. But it would become (at least) an O(n^2) algorithm, instead of the O(n log n) which is promised by the standard.

Upvotes: 5

Why Aren&#39;t More Iterators Random Access?

Answers (3)

Related Questions

Why Aren't More Iterators Random Access?