thedandyman
thedandyman

Reputation: 23

Why is a std::set slower in traversing all the elements?

We have a code that was originally using a deque as a container. Unfortunately, it wasn't fast enough for our time measurements since it's taking quite some time for it to do a search or sort. So, I recently refactored the code to try and use a set instead. It's definitely faster in terms of searching and sorting than a deque.

One thing I noticed though is that the set was way slower when traversing all the elements. We have a test that basically just traverses the elements from beginning up to the end until it matches the values it is looking for and found out that the set is taking almost 5x the time it takes for the deque to do it.

Can someone provide an explanation as to why the set is slower? I assumed that the time would be around the same since it's simply a traversal of all the elements from start to end, but that's not the case. I already did a lot of searching but can't find anything about a set being slower in traversing its elements.

Update: The set/deque contains a class that basically has two integer member variables.

class Element
{
    uint32_t id;
    uint32_t time;
};

Compile options:
-g -pipe -funit-at-a-time
-O3 --param inline-unit-growth=10000 --param large-function-growth=10000 --param max-inline-insns-single=10000
-Wall -Wextra -Wno-aggregate-return -Wno-padded -Wno-reorder -Wno-sign-compare -Wno-unused-parameter -Wcast-align -Wcast-qual -Wdisabled-optimization -Wfloat-equal -Wno-inline -Wlarger-than-10000 -Wmissing-format-attribute -Wpointer-arith -Wredundant-decls -Wno-unknown-pragmas

Upvotes: 2

Views: 225

Answers (2)

Aki Suihkonen
Aki Suihkonen

Reputation: 20027

There are two aspects in set traversal that are more difficult than in list traversal. The cache locality, as explained in the comments and the necessity to store a state to the iterator.

The sets are often implemented as self-balancing trees -- because generating a binary tree from sorted data would produce a degenerate tree, a linked list, which would not allow O(log N) insertions, deletions, or find. The self-balancing property will lead nodes allocated (possibly but not necessarily) from adjacent memory addresses to be accessed in arbitrary order leading to more cache misses.

The other issue is that traversing a tree with an iterator requires a state machine to be encoded in the iterator -- advancing the iterator needs to know, whether to move next to the left child or to the right one, leading to also increased number of branch predictions.

Upvotes: 1

rsy56640
rsy56640

Reputation: 309

ADD:
binary tree is indeed anti-cache-friendly, but I guess it is not the main problem.
.........................
dequeue::iterator++ is just (index++)%size (implemented on array), or take its next pointer(implemented on linked list).
set::iterator++ is INORDER TRAVERSAL in a tree. imagine current position is at root's precursor, so the next is root. it should go up and verify whether the current node is a left child, until it is, and the root is parent of the current node.
So travsersal in dequeue is more trivial than traversal in set.
...........................
std::set is implemented based on RB-Tree(a kind of BST).
find() is implemented with awareness of BST, which is O(logn).
If you traverse from the left-most to right-most to find the element, the cost is O(n).
Traversal from begin() to end() is actually the InOrder Traversal of the RB-Tree, while find() just goes down from root and compare key with two children.

Upvotes: 0

Related Questions