Andrew Tomazos
Andrew Tomazos

Reputation: 68618

Simultaneously iterating over and modifying an unordered_set?

Consider the following code:

unordered_set<T> S = ...;

for (const auto& x : S)
   if (...)
       S.insert(...);

This is broken correct? If we insert something into S then the iterators may be invalidated (due to a rehash), which will break the range-for because under the hood it is using S.begin ... S.end.

Is there some pattern to deal with this?

One way is:

unordered_set<T> S = ...;

vector<T> S2;

for (const auto& x : S)
   if (...)
       S2.emplace_back(...);

for (auto& x : S2)
    S.insert(move(x));

This seems clunky. Is there a better way I'm missing?

(Specifically if I was using a hand-rolled hash table and I could block it from rehashing until the end of the loop, it would be safe to use the first version.)

Update:

From http://en.cppreference.com/w/cpp/container/unordered_map/insert

If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is higher than max_load_factor() * bucket_count().

Could you mess with max_load_factor somehow to prevent rehashing?

Upvotes: 32

Views: 5847

Answers (3)

GManNickG
GManNickG

Reputation: 503805

Could you mess with max_load_factor somehow to prevent rehashing?

Yes, you can set the max_load_factor() to infinity to ensure no rehashing occurs:

#include <iostream>
#include <limits>
#include <unordered_set>

int main()
{
    // initialize
    std::unordered_set<int> S;

    for (int i = 0; i < 8; ++i)
        S.insert(i);

    std::cout << "buckets: " << S.bucket_count() << std::endl;

    // infinite max load factor => never need to rehash
    const auto oldLoadFactor = S.max_load_factor();
    S.max_load_factor(std::numeric_limits<float>::infinity());

    for (const auto& x : S)
    {
        if (x > 2)
            S.insert(x * 2);
    }

    // restore load factor, verify same bucket count
    S.max_load_factor(oldLoadFactor);
    std::cout << "buckets: " << S.bucket_count() << std::endl;

    // now force rehash
    S.rehash(0);
    std::cout << "buckets: " << S.bucket_count() << std::endl;
}

Note that simply setting a new load factor does no rehashing, so those are cheap operations.

The rehash(0) bit works because it's a request that: 1) I get at least n buckets, and 2) I have enough buckets to satisfy my max_load_factor(). We just use zero to indicate we don't care for a minimum amount, we just want to rehash to satisfy our "new" factor, as if it was never changed to infinity.

Of course, this isn't exception-safe; if anything throws between the calls to max_load_factor(), our old factor is lost forever. Easily fixed with your favorite scope-guard utility or a utility class.

Note that you get no guarantees if you'll iterate over the new elements. You will iterate over the existing elements, but you may or may not iterate over the new elements. If that is okay (which per our chat it should be), then this will work.

For example, consider you iterate over an unordered set of integer and for each even integer x, insert x * 2. If those always get inserted just after your currrent position (by chance of implementation-detail and container state), you will never terminate the loop except through exceptions.

If you do need some guarantees, you need to with an alternate storage solution.

Upvotes: 23

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153810

I realized that it is conceptually the same as what you proposed but I think it looks actually reasonably slick:

std::vector<T> tmp;
std::copy_if(S.begin(), S.end(), std::back_inserter(tmp),
             [](T const& value) { return ...; });
S.insert(std::make_move_iterator(tmp.begin()),
         std::make_move_iterator(tmp.end()));

Upvotes: 2

Useless
Useless

Reputation: 67723

Modifying any container while you're iterating over it tends to get hairy - even if it's a simpler structure than a hash, or even if you can prevent it from re-hashing, re-balancing or whatever.

Even if it did work, by the way, there's an ambiguity: should your newly-inserted members be iterated over or not? Is it ok to include them in this iteration only sometimes (ie, only if they happen to end up after the current iterator)?

If you need to do this a lot, you could usefully wrap the container in a generic adapter that defers all the inserts until the end, but you're really finding a way to hide the code you already have.

Upvotes: 6

Related Questions