Shachar Shemesh
Shachar Shemesh

Reputation: 8563

std::set erase complexity anomality?

I am trying to figure out the complexity of erasing multiple elements from std::set. I am using this page as source.

It claims that the complexity for erasing a single item using an iterator is amortized O(1), but erasing multiple items using the range form is log(c.size()) + std::distance(first, last) (i.e. - log of the set's size + the number of elements deleted).

Taken at face value, if the number of elements to be erased (n) is much smaller than the number of elements in the set (m), this means that looping over the elements to be erased and erasing them one at a time is quicker (O(n)) than erasing them with one call (O(log m) assuming n<<m).

Obviously, had that really been the case, the internal implementation of the second form would just do the above loop.

Is this an error at the site? A bug in the specs? Am I just missing something?

Thanks, Shachar

Upvotes: 11

Views: 16362

Answers (2)

Andrushenko Alexander
Andrushenko Alexander

Reputation: 1973

Internally elements of a set are stored in a balanced binary tree. Balanced tree is a tree where maximal height difference between any left and right subtree of any node is 1.

Maintaining balanced structure is important to guarantee that a search of any element in the tree (in the set) takes in worst case O(log(n)) steps.

Removal of an element may destroy balance. To restore balance rotations must be performed. In some cases a single removal causes several rotations so that the operation takes O(log(n)) steps, but in average a removal takes O(1) steps.

So, when several elements scattered over the set must be deleted, one by one, the amortized costs with high probability will be O(1) per removal.

Removing several elements in the range (first, last, where one element follows the next) will almost certainly destroy the balance, what causes the log factor in the complexity: log(n) + std::distance(first, last)

Upvotes: 6

Shachar Shemesh
Shachar Shemesh

Reputation: 8563

It seems the problem is hiding behind the (somewhat weasel) word "amortized". The single item erase has O complexity of log(c.size()), but amortized complexity of O(1).

Performing multiple single erases in a loop will thus cost log(c.size()) + number of erases, which is exactly what the range form's complexity is.

Shachar

Upvotes: 2

Related Questions