Reputation: 340
In the below example I see a performance benefit of element assignment (given a vector of the correct size) versus emplace_back
(given a vector with reserved storage) for a plain old data (POD) struct. Could someone elaborate where this difference comes from?
Thank you very much in advance!
Notes
Code
#include <iostream>
#include <vector>
#include <chrono>
#include <numeric>
using std::cout;
using std::endl;
using std::vector;
using std::size_t;
typedef std::chrono::high_resolution_clock hrc;
typedef std::chrono::microseconds ms;
using std::chrono::duration_cast;
struct Data {
int x, y;
inline Data() noexcept: x(0), y(0) {}
inline Data(int x, int y) noexcept: x(x), y(y) {}
};
int main() {
constexpr size_t n = 1000000;
constexpr size_t reps = 5;
for (size_t rep = 0; rep < reps; rep++) {
{
vector<Data> vec;
vec.reserve(n);
auto t1 = hrc::now();
for (size_t i = 0; i < n; i++)
vec.emplace_back(i, -i);
auto t2 = hrc::now();
cout << "Emplace Back: " << duration_cast<ms>(t2 - t1).count() << " ms" << endl;
// Check
size_t sum = 0;
for (auto const &elem : vec)
sum += elem.x;
if (sum != ((n * (n - 1)) / 2))
return EXIT_FAILURE;
}
{
vector<Data> vec;
vec.resize(n);
auto t1 = hrc::now();
for (size_t i = 0; i < n; i++)
vec[i] = Data(i, i);
auto t2 = hrc::now();
cout << "Assign : " << duration_cast<ms>(t2 - t1).count() << " ms" << endl;
// Check
size_t sum = 0;
for (auto const &elem : vec)
sum += elem.x;
if (sum != ((n * (n - 1)) / 2))
return EXIT_FAILURE;
}
}
}
Output
sysctl -n machdep.cpu.brand_string && clang++ -v && clang++ -o main -std=c++17 -O3 main.cpp && ./main
Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Emplace Back: 6162 ms
Assign : 1000 ms
Emplace Back: 2874 ms
Assign : 864 ms
Emplace Back: 2149 ms
Assign : 855 ms
Emplace Back: 2062 ms
Assign : 934 ms
Emplace Back: 2678 ms
Assign : 1030 ms
Upvotes: 1
Views: 1153
Reputation: 85351
First of all, two observations:
y
argument:vec.emplace_back(i, -i);
vs. vec[i] = Data(i, i);
Then some high-level analysis:
The reason the emplace_back
version takes longer than assignment via index access could be because emplace_back
, in addition to creating a new element, needs to grow the vector by 1. Even when there's enough reserved space, growing of the vector involves (1) a check if there's enough space, and (2) an update of the internal vector size field.
Vector index access on the other hand, performs no bounds checking, let alone update any size. It literally does as much as a raw pointer dereference.
The element type, struct Data
is very simple. Creating, copying or overwriting it should take negligible time.
Finally, we analyze the generated assembly to know for sure what's really going on:
emplace_back
version:
leaq 8000000(%rax), %r14
xorl %ebp, %ebp
movq %rbx, %r12
movq %rax, 8(%rsp)
jmp .L14
.L73:
movl %ebp, %eax
movd %ebp, %xmm0
addq $1, %rbp
addq $8, %rbx
negl %eax
movd %eax, %xmm5
punpckldq %xmm5, %xmm0
movq %xmm0, -8(%rbx)
cmpq $1000000, %rbp
je .L72
.L14:
movq %r14, %r15
subq %r12, %r15
cmpq %rbx, %r14
jne .L73
Index access version:
leaq 8000000(%rax), %r13
. . .
pxor %xmm1, %xmm1
movq %rax, %r12
movq %rbp, %rax
.L27:
movdqa .LC3(%rip), %xmm2
movdqa %xmm1, %xmm0
addq $16, %rax
paddq .LC2(%rip), %xmm1
paddq %xmm0, %xmm2
shufps $136, %xmm2, %xmm0
movups %xmm0, -16(%rax)
cmpq %rax, %r13
jne .L27
Conclusion:
y
).addq $1, %rbp
) and checking (cmpq $1000000, %rbp
).Upvotes: 3
Reputation: 40842
emplace_back
Creates an instance of Date
by emplace_back
in the preallocated spaces in vec
assignment
Creates an instance of Date and assigns that to the Date
object that already exists in vec
.
emplace_back
Same as above: Creates an instance of Date
by emplace_back
in the preallocated spaces in vec
.
Increases the size
and checks if new space has to be allocated.
assignment
Just assigns - in this case, because Date
is a really simple object - i
to the members of x
and y
of the already created objected in the resize
step.
So for the assignment case, you completly miss the time required to create the Date
objects because you don't include resize
in the measurment.
Your vec.resize(n);
in the assign case fills the vector with n
elements by default constructing them (a placement new is called n
times). In the loop, an assignment to those created objects happens.
For the first case (emplace_back
) the size
(not the capacity) of the vector is increased by one in each iteration and it starts the lifetime of each added object with a placement new, while for the second case (assign) the size
does not change and you only assign the Date
object to the already constructed one.
The compiler could optimize the assignment case to just assign x
and y
to the already existing instance in the vector without creating an intermediate Date
object.
To measure in a meaningful way you need to include vec.reserve(n)
and vec.resize(n)
. Currently, you just measure some differences in the optimization process of the compilers.
If you also include vec.reserve(n)
and vec.resize(n)
the measuremnts are much closer together.
For such simple classes like the Date
one there won't be much of a difference between assignment and emplace back, because the construction in the loop of the assignment case can often be optimized away.
When you inlcude the vec.reserve(n)
and vec.resize(n)
the timing is:
gcc 8.4
Emplace Back: 11594 ms
Assign : 11516 ms
Emplace Back: 2001 ms
Assign : 2691 ms
Emplace Back: 2523 ms
Assign : 1847 ms
Emplace Back: 1956 ms
Assign : 1277 ms
Emplace Back: 949 ms
Assign : 903 ms
clang
Emplace Back: 2115 ms
Assign : 2640 ms
Emplace Back: 765 ms
Assign : 766 ms
Emplace Back: 666 ms
Assign : 540 ms
Emplace Back: 535 ms
Assign : 515 ms
Emplace Back: 537 ms
Assign : 543 ms
Upvotes: 2