Reputation: 21160
I was investigating the performance of moving std::string
. For the longest time, I've regarded string moves as almost free, thinking the compiler will inline everything and it will only involve a few cheap assignments.
In fact, my mental model for moving is literally
string& operator=(string&& rhs) noexcept
{
swap(*this, rhs);
return *this;
}
friend void swap(string& x, string& y) noexcept
{
// exposition only
unsigned char buf[sizeof(string)];
memcpy(buf, &x, sizeof(string));
memcpy(&x, &y, sizeof(string));
memcpy(&y, buf, sizeof(string));
}
To the best of my understanding, this is a legal implementation if the memcpy
is changed to assigning individual fields.
It is to my great surprise to find gcc's implementation of moving involves creating a new string and might possibly throw due to the allocations despite being noexcept
.
Is this even conforming? Equally important, should I not think moving is almost free?
Bewilderingly, std::vector<char>
compiles down to what I'd expect.
clang's implementation is much different, although there is a suspicious std::string::reserve
Upvotes: 13
Views: 1170
Reputation: 136485
Not exactly an answer, but this is the new implementation of C++11 std::string
without the reference counter and with small string optimisation what causes voluminous assembly. Particularly, the small string optimisation causes 4 branches to handle 4 different combinations of lengths of the source and the target of the move assignment.
When -D_GLIBCXX_USE_CXX11_ABI=0
option is added to use the pre C++-11 std::string
with a reference counter and no small string optimisation the assembly code looks much better.
should I not think moving is almost free?
In Nothing is Better than Copy or Move by Roger Orr talk, slides page 47 it says:
Comparison of copy and move
- Many people incorrectly think of move as effectively 'free'
- The performance difference between copy and move varies widely
- For a primitive type, such as int, copying or moving are effectively identical
- Move is faster than copy when only part of the object needs to be transferred to transfer the whole value
Upvotes: 0
Reputation: 29970
I've only analyzed GCC's version. Here's what's going on: the code handles different kind of allocators. If the allocator has the trait of _S_propagate_on_move_assign
or _S_always_equal
, then the move is almost free, as you expect. This is the if
in move operator=
:
if (!__str._M_is_local()
&& (_Alloc_traits::_S_propagate_on_move_assign()
|| _Alloc_traits::_S_always_equal()))
// cheap move
else assign(__str);
If the condition is true (_M_is_local()
means small string, description here), then the move is cheap.
If it is false, then it calls normal assign
(not the moving one). This is the case when either:
assign
will do a simple memcpy (cheap)What does this mean?
It means, that if you use the default allocator (or any allocator with traits mentioned earlier), then the move is still almost free.
On the other hand, the generated code is unnecessarily huge, and can be improved I think. It should have a separate code for handling usual allocators, or have a better assign
code (the problem is that assign
doesn't check for _M_is_local()
, but it does a capacity check, so the compiler cannot decide whether an allocation is needed or not, so it puts the allocation codepath into the executable unnecessarily - you can check out the exact details in the source code).
Upvotes: 1