Reputation: 3941
Editor's note:
Followup question with optimization enabled that times only the loop:
Why is iterating though `std::vector` faster than iterating though `std::array`?
where we can see the effect of lazy-allocation page faults in reading uninitialized BSS memory vs. dynamically-allocated + written memory that's initialized outside the timed loop.
I tried profiling this code:
#include <vector>
#include <array>
#include <stdio.h>
using namespace std;
constexpr int n = 400'000'000;
//vector<int> v(n);
array<int, n> v;
int main()
{
int res = 0;
for(int x : v)
res += x;
printf("%d\n", res);
}
On my machine, the array
version is faster than vector
.
The memory allocation is irrelevant in this case as it's only once.
$ g++ arrVsVec.cpp -O3
$ time ./a.out
0
real 0m0,445s
user 0m0,203s
sys 0m0,238s
$ g++ arrVsVec.cpp -O3
$ time ./a.out
0
real 0m0,749s
user 0m0,273s
sys 0m0,476s
I have found the disassembly is much more complicated for std::vector
: https://godbolt.org/z/111L5G
Upvotes: 3
Views: 666
Reputation: 2992
You're not using result, you're initializing vector to zeroes and you didn't enable optimizations. Once you do:
int main()
{
unsigned int res = 0;
for(int &x : v)
x = rand();
for(int i = 0; i < 128; ++i) {
for(int x : v)
res += (unsigned int)x;
}
std::cout << res;
}
times are identical:
Success #stdin #stdout 0.62s 54296KB
-2043861760
Success #stdin #stdout 0.62s 54296KB
-2043861760
EDIT: changed res type to be unsigned int to avoid undefined behaviour.
Upvotes: 6
Reputation: 238351
How I compile:
g++ arrVsVec.cpp
Why is iterating an std::array much faster than iterating an std::vector?
Because you did not compile with optimisations enabled.
Furthermore, you don't use the result of the iteration for anything, and the entire computation uses compile time constant inputs, so even if you did enable optimisation, then the loop would probably be optimised away, and you would then only be measuring the dynamic allocation versus not dynamic allocation. Hint: Performing a dynamic allocation is infinitely slower than doing nothing.
So, to conclude:
Upvotes: 6
Reputation: 85361
Answer for optimization on (g++ -O2
):
You're not using the end result, so the compiler is free to optimize the entire loop out.
Which is what happens in the array
case.
main:
xor eax, eax
ret
But the vector
allocates and deallocates heap memory, which complicates the optimization and the compilers tend to play it safe and leave it in place:
main:
xor eax, eax
ret
_GLOBAL__sub_I_v:
sub rsp, 8
mov edi, 400000000
mov QWORD PTR v[rip], 0
mov QWORD PTR v[rip+8], 0
mov QWORD PTR v[rip+16], 0
call operator new(unsigned long)
lea rdx, [rax+400000000]
mov QWORD PTR v[rip], rax
mov QWORD PTR v[rip+16], rdx
.L6:
mov DWORD PTR [rax], 0
add rax, 4
cmp rdx, rax
jne .L6
mov QWORD PTR v[rip+8], rdx
mov esi, OFFSET FLAT:v
mov edx, OFFSET FLAT:__dso_handle
mov edi, OFFSET FLAT:_ZNSt6vectorIiSaIiEED1Ev
add rsp, 8
jmp __cxa_atexit
So that's why the array
version is faster in this particular case. In a more real-life application the difference wouldn't be that dramatic, and will mostly come down to the heap allocation/deallocation time for the vector
case.
Answer for optimization off (g++
):
Don't benchmark stuff compiled without optimization.
The difference is probably due to the vector
iterator not being inlined. So accessing vector elements in debug more incurs an extra indirection compared to array access.
Upvotes: 7