Reputation: 3693

Why is vector(size) slower than new[]?

I was benchmarking some STL algorithms, and I was surprised by the time taken by the following code: (I measured the g++ compiled code [no optimizations] with the time command)

#include <vector>
struct vec2{
    int x, y;
    vec2():x(0), y(0) {}
};
int main(int argc, char* argv[]){
    const int size = 200000000;
    std::vector<vec2> tab(size); //2.26s
//  vec2* tab = new vec2[size]; //1.29s
//  tab[0].x = 0;
//  delete[] tab;
    return 0;
}

The time taken by a vector initialization is 2.26s while a new (and delete) takes 1.29s. What is the vector ctor doing that would take so much longer? new[] calls the constructor on every element, just as the vector ctor would, right?

I then compiled with -O3, it went all faster, but there was still a gap between the two codes. (I got respectively 0.83s and 0.75s)

Any ideas?

Upvotes: 7

Answers (3)

Nemo

Reputation: 71555

Both versions initialize the memory.

As several people have pointed out, the vector uses copy construction while the array uses the default constructor. Your compiler appears to optimize the latter better than the former.

Note that in Real Life, you rarely want to initialize such a huge array in one fell swoop. (What use are a bunch of zeroes? Obviously you intend to put something else in there eventually... And initializing hundreds of megabytes is very cache-unfriendly.)

Instead, you would write something like:

const int size = 200000000;
std::vector<vec2> v;
v.reserve(size);

Then when you are ready to put a real element into the vector, you use v.push_back(element). The reserve() allocates memory without initializing it; the push_back() copy-constructs into the reserved space.

Alternatively, when you want to put a new element into the vector, you can use v.resize(v.size()+1) and then modify the element v.back(). (This is how a "pool allocator" might work.) Although this sequence will initialize the element and then overwrite it, it will all happen in the L1 cache, which is almost as fast as not initializing it at all.

So for a fair comparison, try a large vector (with reserve) vs. an array for creating a sequence of non-identical items. You should find the vector is faster.

Upvotes: 7

Gene Bushuyev

Reputation: 5538

After analysing assembly generated by VC++ for these two cases, here is what I found. Compiler inlined practically everything and generated very similar loops for initialization after memory allocation. In case of vector inner loop looks like this:

013E3FC0  test        eax,eax  
013E3FC2  je          std::_Uninit_def_fill_n<vec2 *,unsigned int,vec2,std::allocator<vec2>,vec2>+19h (13E3FC9h)  
013E3FC4  mov         dword ptr [eax],edx  
013E3FC6  mov         dword ptr [eax+4],esi  
013E3FC9  add         eax,8  
013E3FCC  dec         ecx  
013E3FCD  jne         std::_Uninit_def_fill_n<vec2 *,unsigned int,vec2,std::allocator<vec2>,vec2>+10h (13E3FC0h)

where edx and esi registers were zerroed outside of the loop:

00013FB5  xor         edx,edx  
00013FB7  xor         esi,esi  
00013FB9  lea         esp,[esp]

In case of new[] inner loop looks like this:

009F1800  mov         dword ptr [ecx],0  
009F1806  mov         dword ptr [ecx+4],0  
009F180D  add         ecx,8  
009F1810  dec         edx  
009F1811  jns         main+30h (9F1800h)

Differences are very insignificant, a few more instructions in case of vector, but probably also faster movs from registers. Since in most real-life cases, constructors do a lot more than assigning zeros, this difference can hardly be noticeable at all. So the value of this testing is questionable.

Upvotes: 2

AnT stands with Russia

Reputation: 320631

The speed will depend on implementation, but most likely reason for the vector being slower is that vector cannot default-construct its elements. Vector elements are always copy-constructed. For example

std::vector<vec2> tab(size);

is in reality interpreted as

std::vector<vec2> tab(size, vec2());

i.e. the second argument gets its value from default argument. The vector then allocates raw memory and copies this default-constructed element passed from the outside into every element of the new vector (by using copy-constructor). This could be generally slower than default-constructing each element directly (as new[] does).

To illustrate the difference with a code sketch, new vec2[size] is roughly equivalent to

vec2 *v = (vec2 *) malloc(size * sizeof(vec2));

for (size_t i = 0; i < size; ++i)
  // Default-construct `v[i]` in place
  new (&v[i]) vec2();

return v;

while vector<vec2>(size) is roughly equivalent to

vec2 source; // Default-constructed "original" element

vec2 *v = (vec2 *) malloc(size * sizeof(vec2));

for (size_t i = 0; i < size; ++i)
  // Copy-construct `v[i]` in place
  new (&v[i]) vec2(source);

return v;

Depending on the implementation, the second approach might turn out slower.

The two-times difference in speed is hard to justify though, but benchmarking non-optimized code makes no sense either. The much less significant difference you observed with optimized code is exactly what one might reasonably expect in this case.

Upvotes: 9

Why is vector(size) slower than new[]?

Answers (3)

Related Questions