Reputation: 3693
I was benchmarking some STL algorithms, and I was surprised by the time taken by the following code: (I measured the g++ compiled code [no optimizations] with the time
command)
#include <vector>
struct vec2{
int x, y;
vec2():x(0), y(0) {}
};
int main(int argc, char* argv[]){
const int size = 200000000;
std::vector<vec2> tab(size); //2.26s
// vec2* tab = new vec2[size]; //1.29s
// tab[0].x = 0;
// delete[] tab;
return 0;
}
The time taken by a vector initialization is 2.26s while a new
(and delete
) takes 1.29s. What is the vector ctor doing that would take so much longer? new[]
calls the constructor on every element, just as the vector
ctor would, right?
I then compiled with -O3, it went all faster, but there was still a gap between the two codes. (I got respectively 0.83s and 0.75s)
Any ideas?
Upvotes: 7
Views: 439
Reputation: 71555
Both versions initialize the memory.
As several people have pointed out, the vector uses copy construction while the array uses the default constructor. Your compiler appears to optimize the latter better than the former.
Note that in Real Life, you rarely want to initialize such a huge array in one fell swoop. (What use are a bunch of zeroes? Obviously you intend to put something else in there eventually... And initializing hundreds of megabytes is very cache-unfriendly.)
Instead, you would write something like:
const int size = 200000000;
std::vector<vec2> v;
v.reserve(size);
Then when you are ready to put a real element into the vector, you use v.push_back(element)
. The reserve()
allocates memory without initializing it; the push_back()
copy-constructs into the reserved space.
Alternatively, when you want to put a new element into the vector, you can use v.resize(v.size()+1)
and then modify the element v.back()
. (This is how a "pool allocator" might work.) Although this sequence will initialize the element and then overwrite it, it will all happen in the L1 cache, which is almost as fast as not initializing it at all.
So for a fair comparison, try a large vector (with reserve
) vs. an array for creating a sequence of non-identical items. You should find the vector is faster.
Upvotes: 7
Reputation: 5538
After analysing assembly generated by VC++ for these two cases, here is what I found. Compiler inlined practically everything and generated very similar loops for initialization after memory allocation. In case of vector inner loop looks like this:
013E3FC0 test eax,eax
013E3FC2 je std::_Uninit_def_fill_n<vec2 *,unsigned int,vec2,std::allocator<vec2>,vec2>+19h (13E3FC9h)
013E3FC4 mov dword ptr [eax],edx
013E3FC6 mov dword ptr [eax+4],esi
013E3FC9 add eax,8
013E3FCC dec ecx
013E3FCD jne std::_Uninit_def_fill_n<vec2 *,unsigned int,vec2,std::allocator<vec2>,vec2>+10h (13E3FC0h)
where edx
and esi
registers were zerroed outside of the loop:
00013FB5 xor edx,edx
00013FB7 xor esi,esi
00013FB9 lea esp,[esp]
In case of new[]
inner loop looks like this:
009F1800 mov dword ptr [ecx],0
009F1806 mov dword ptr [ecx+4],0
009F180D add ecx,8
009F1810 dec edx
009F1811 jns main+30h (9F1800h)
Differences are very insignificant, a few more instructions in case of vector
, but probably also faster mov
s from registers. Since in most real-life cases, constructors do a lot more than assigning zeros, this difference can hardly be noticeable at all. So the value of this testing is questionable.
Upvotes: 2
Reputation: 320631
The speed will depend on implementation, but most likely reason for the vector being slower is that vector cannot default-construct its elements. Vector elements are always copy-constructed. For example
std::vector<vec2> tab(size);
is in reality interpreted as
std::vector<vec2> tab(size, vec2());
i.e. the second argument gets its value from default argument. The vector then allocates raw memory and copies this default-constructed element passed from the outside into every element of the new vector (by using copy-constructor). This could be generally slower than default-constructing each element directly (as new[]
does).
To illustrate the difference with a code sketch, new vec2[size]
is roughly equivalent to
vec2 *v = (vec2 *) malloc(size * sizeof(vec2));
for (size_t i = 0; i < size; ++i)
// Default-construct `v[i]` in place
new (&v[i]) vec2();
return v;
while vector<vec2>(size)
is roughly equivalent to
vec2 source; // Default-constructed "original" element
vec2 *v = (vec2 *) malloc(size * sizeof(vec2));
for (size_t i = 0; i < size; ++i)
// Copy-construct `v[i]` in place
new (&v[i]) vec2(source);
return v;
Depending on the implementation, the second approach might turn out slower.
The two-times difference in speed is hard to justify though, but benchmarking non-optimized code makes no sense either. The much less significant difference you observed with optimized code is exactly what one might reasonably expect in this case.
Upvotes: 9