Alexey Starinsky
Alexey Starinsky

Reputation: 4339

Why std::vector<uint8_t>::insert works 5 times faster than std::copy with MSVC 2015 compiler?

I have a trivial function that copies a byte block to std::vector:

std::vector<uint8_t> v;

void Write(const uint8_t * buffer, size_t count)
{
    //std::copy(buffer, buffer + count, std::back_inserter(v));

    v.insert(v.end(), buffer, buffer + count);
}

v.reserve(<buffer size>);
v.resize(0);

Write(<some buffer>, <buffer size>);

if I use std::vector<uint8_t>::insert it works 5 times faster than if I use std::copy.

I tried to compile this code with MSVC 2015 with enabled and disabled optimization and got the same result.

Looks like something is strange with std::copy or std::back_inserter implementation.

Upvotes: 1

Views: 843

Answers (3)

Pete Becker
Pete Becker

Reputation: 76438

The call to v.insert is calling a member function of the container. The member function knows how the container is implemented, so it can do things that a more generic algorithm can't do. In particular, when inserting a range of values designated by random-access iterators into a vector, the implementation knows how many elements are being added, so it can resize the internal storage once and then just copy the elements.

The call to std::copy with an insert-iterator, on the other hand, has to call insert for each element. It can't preallocate, because std::copy works with sequences, not containers; it doesn't know how to adjust the size of the container. So for large insertions into a vector the internal storage gets resized each time the vector is full and a new insertion is needed. The overhead of that reallocation is amortized constant time, but the constant is much larger than the constant when only one resizing is done.

With the call to reserve (which I overlooked, thanks, @ChrisDrew), the overhead of reallocating is not as significant. But the implementation of insert knows how many values are being copied, and it knows that those values are contiguous in memory (because the iterator is a pointer), and it knows that the values are trivially copyable, so it will use std::memcpy to blast the bits in all at once. With std::copy, none of that applies; the back inserter has to check whether a reallocation is necessary, and that code can't be optimized out, so you end up with a loop that copies an element at a time, checking for the end of the allocated space for each element. That's much more expensive than a plain std::memcpy.

In general, the more the algorithm knows about the internals of the data structure that it's accessing, the faster it can be. STL algorithms are generic, and the cost of that genericity can be more overhead than a that of a container-specific algorithm.

Upvotes: 3

YSC
YSC

Reputation: 40120

Standard library implementation is written with performance in mind, but performance is achieved only when optimization is ON.

//This reduces the performance dramatically if the optimization is switched off.

Trying to measure a function performance with optimization OFF is as pointless as asking ourselves if the law of gravitation would still be true if there were no mass left in the Universe.

Upvotes: 5

Alan Birtles
Alan Birtles

Reputation: 36488

With a good implementation of std::vector, v.insert(v.end(), buffer, buffer + count); might be implemented as:

size_t count = last-first;
resize(size() + count);
memcpy(data+offset, first, count);

std::copy(buffer, buffer + count, std::back_inserter(v)) on the other hand will be implemented as:

while ( first != last )
{
   *output++ = *first++;
}

which is equivalent to:

while ( first != last )
{
   v.push_back( *first++ );
}

or (roughly):

while ( first != last )
{
   // push_back should be slightly more efficient than this
   v.resize(v.size() + 1);
   v.back() = *first++;
}

Whilst in theory the compiler could optimise the above into a memcpy its unlikely to, at best you'll probably get the methods inlined so that you don't have a function call overhead, it'll still be writing one byte at a time whereas a memcpy will normally use vector instructions to copy multiple bytes at once.

Upvotes: 1

Related Questions