Reputation: 3941

Why is iterating an std::array much faster than iterating an std::vector?

Editor's note:
Followup question with optimization enabled that times only the loop:
Why is iterating though `std::vector` faster than iterating though `std::array`?
where we can see the effect of lazy-allocation page faults in reading uninitialized BSS memory vs. dynamically-allocated + written memory that's initialized outside the timed loop.

I tried profiling this code:

#include <vector>
#include <array>
#include <stdio.h>

using namespace std;

constexpr int n = 400'000'000;
//vector<int> v(n);
array<int, n> v;

int main()
{
    int res = 0;
    for(int x : v)
        res += x;
    printf("%d\n", res);
}

On my machine, the array version is faster than vector.

The memory allocation is irrelevant in this case as it's only once.

$ g++ arrVsVec.cpp -O3
$ time ./a.out
0

real    0m0,445s
user    0m0,203s
sys 0m0,238s

$ g++ arrVsVec.cpp -O3
$ time ./a.out
0

real    0m0,749s
user    0m0,273s
sys 0m0,476s

I have found the disassembly is much more complicated for std::vector: https://godbolt.org/z/111L5G

Upvotes: 3

Answers (3)

Radosław Cybulski

Reputation: 2992

You're not using result, you're initializing vector to zeroes and you didn't enable optimizations. Once you do:

int main()
{
    unsigned int res = 0;
    for(int &x : v)
        x = rand();

    for(int i = 0; i < 128; ++i) {
        for(int x : v)
            res += (unsigned int)x;
    }
    std::cout << res;
}

times are identical:

Success #stdin #stdout 0.62s 54296KB
-2043861760
Success #stdin #stdout 0.62s 54296KB
-2043861760

EDIT: changed res type to be unsigned int to avoid undefined behaviour.

Upvotes: 6

eerorika

Reputation: 238351

How I compile:
g++ arrVsVec.cpp
Why is iterating an std::array much faster than iterating an std::vector?

Because you did not compile with optimisations enabled.

Furthermore, you don't use the result of the iteration for anything, and the entire computation uses compile time constant inputs, so even if you did enable optimisation, then the loop would probably be optimised away, and you would then only be measuring the dynamic allocation versus not dynamic allocation. Hint: Performing a dynamic allocation is infinitely slower than doing nothing.

So, to conclude:

Unoptimised binaries are slow because of lack of optimisation; Measure optimised binaries
If you intend to measure the speed of iteration, then measure the iteration only; Don't measure memory allocation.
Prefer to avoid compile time constant input.
Use the result of the measured code so that it cannot be optimised away.

Upvotes: 6

rustyx

Reputation: 85361

Answer for optimization on (g++ -O2):

You're not using the end result, so the compiler is free to optimize the entire loop out.

Which is what happens in the array case.

main:
        xor     eax, eax
        ret

But the vector allocates and deallocates heap memory, which complicates the optimization and the compilers tend to play it safe and leave it in place:

main:
        xor     eax, eax
        ret
_GLOBAL__sub_I_v:
        sub     rsp, 8
        mov     edi, 400000000
        mov     QWORD PTR v[rip], 0
        mov     QWORD PTR v[rip+8], 0
        mov     QWORD PTR v[rip+16], 0
        call    operator new(unsigned long)
        lea     rdx, [rax+400000000]
        mov     QWORD PTR v[rip], rax
        mov     QWORD PTR v[rip+16], rdx
.L6:
        mov     DWORD PTR [rax], 0
        add     rax, 4
        cmp     rdx, rax
        jne     .L6
        mov     QWORD PTR v[rip+8], rdx
        mov     esi, OFFSET FLAT:v
        mov     edx, OFFSET FLAT:__dso_handle
        mov     edi, OFFSET FLAT:_ZNSt6vectorIiSaIiEED1Ev
        add     rsp, 8
        jmp     __cxa_atexit

So that's why the array version is faster in this particular case. In a more real-life application the difference wouldn't be that dramatic, and will mostly come down to the heap allocation/deallocation time for the vector case.

Answer for optimization off (g++):

Don't benchmark stuff compiled without optimization.

The difference is probably due to the vector iterator not being inlined. So accessing vector elements in debug more incurs an extra indirection compared to array access.

Upvotes: 7

Why is iterating an std::array much faster than iterating an std::vector?

Answers (3)

Related Questions