Reputation: 67193
I have an array of integers stored contiguously in memory and I want to add them all to a unordered_set
collection.
Right now, I am adding them one at a time.
for (int i = 0; i < count; i++)
collection.insert(pi[i]);
Is there any way to do this more efficiently?
I realize that the items are not stored contiguously within the collection, so it won't be as simply as just handing the array over to the collection. But can this be optimized somehow?
Upvotes: 5
Views: 2807
Reputation: 196
Using the range based constructor or the insert method will be concise and elegant but likely just as efficient as your method. The reason is that the iterators passed to these functions are input iterators and not random iterators. Because of that, the length of the range cannot be calculated and the elements must be inserted one by one with periodic rehashes when the load factor of the set gets to high.
Consider calling the reserve method of std::unordered_set.
collection.reserve(pi.size());
collection.insert(pi.begin(), pi.end());
EDIT: As mentioned in the comments, one could also worry about the efficiency of hashing the inserted elements one by one. It would then be efficient to be able to perform some sort of bulk insertions. However, in the case of the OP, the elements are integers which happen to be hashed using the identity function in most if not all implementations of std::hash, which does not cost that much ;). Indeed, it is the best hash function one can get for random integers. Other hash functions might be more suitable in case of "organized" sets.
EDIT2: The comment section is now speculating on what could be a better implementation of the insert method. I maintain that the insert overload based on a range asks for input iterators, so yes, you can actually pass any non-output-iterator. Also have a look at the worst case complexity for range insertion: you will see that it is specified so that it is allowed to insert elements one by one. Finally, take a look at some implementations of the insert method and you will see that there is no particular overload for random access iterators. This makes sens as there is no reason to impose an additional check in the insert method while the reserve method is here for the case where we want to set the container to, at least, a given capacity. Based on that, the answer above is very likely to be the best technique based on actual implementations of the stdlib.
Upvotes: 3
Reputation: 4369
unordered_set
has a constructor that takes a range of elements to initially add them:
template< class InputIt >
unordered_set( InputIt first, InputIt last,
size_type bucket_count = /*implementation-defined*/,
const Hash& hash = Hash(),
const key_equal& equal = key_equal(),
const Allocator& alloc = Allocator() );
So you can just do collection = std::unordered_set{ p, p + count };
and leave it up to implementation.
As other user pointed out in comments, there's also an overload for insert
that takes a range:
template< class InputIt >
void insert( InputIt first, InputIt last );
So, just like calling the constructor, you can do, collection.insert(p, p + count);
There's no guarantee that this overloads will be more efficient, since the complexity is linear in both overloads on average, as well as for just inserting elements one by one.
In fact, if we look into how insert
is implemented in MSVC, it's very simple
template<class _Iter>
void insert(_Iter _First, _Iter _Last)
{ // insert [_First, _Last) at front, then put in place
_DEBUG_RANGE(_First, _Last);
for (; _First != _Last; ++_First)
emplace(*_First);
}
so no optimization for this case.
I think, the best way to go about this is to call reserve
, if you know a number of elements you are going to add and, if there are many collisions (which there won't be for integers), maybe modifying bucket_count
.
Upvotes: 6