Reputation: 37045
I have been looking through the Clang source code and I found this snippet:
void CompilerInstance::setInvocation(
std::shared_ptr<CompilerInvocation> Value) {
Invocation = std::move(Value);
}
Why would I want to std::move
an std::shared_ptr
?
Is there any point transferring ownership on a shared resource?
Why wouldn't I just do this instead?
void CompilerInstance::setInvocation(
std::shared_ptr<CompilerInvocation> Value) {
Invocation = Value;
}
Upvotes: 228
Views: 116236
Reputation: 359
Just to mention, I think it's not a good practice to move something in set method;
if I set a value somewhere, then the value if cleared, I think this is horrible:
void CompilerInstance::setInvocation(
std::shared_ptr<CompilerInvocation> Value) {
Invocation = std::move(Value);
}
https://en.cppreference.com/w/cpp/utility/move
Unless otherwise specified, all standard library objects that have been moved from are placed in a "valid but unspecified state", meaning the object's class invariants hold
Upvotes: 0
Reputation: 16091
Unfortunately I did not read @yano's anwer. So I did my own benchmark. Sad that nobody tried to verify the hypotheses around here. My results were the similar to yanos, in the sense that the improvement is far away from hundreds of times.
On my Macbook Air move
is three times faster (g++
as well as clang++
-std=c++17 -O3 -DNDEBUG
). Let me know if you see problems with the benchmark.
#include <chrono>
#include <iostream>
#include <vector>
#include <memory>
using namespace std;
using namespace std::chrono;
int COUNT = 50'000'000;
struct TimeIt
{
system_clock::time_point start;
TimeIt() {
start = system_clock::now();
}
~TimeIt() {
auto runtime = duration_cast<milliseconds>(system_clock::now()-start).count();
cout << runtime << " ms" << endl;
}
};
void benchmark_copy(const vector<shared_ptr<int>> &vec_src)
{
cout << "benchmark_copy" << endl;
vector<shared_ptr<int>> vec_dst;
vec_dst.reserve(COUNT);
TimeIt ti;
for(auto &sp : vec_src)
vec_dst.emplace_back(sp);
}
void benchmark_move(vector<shared_ptr<int>> &&vec_src)
{
cout << "benchmark_move" << endl;
vector<shared_ptr<int>> vec_dst;
vec_dst.reserve(COUNT);
TimeIt ti;
for(auto &sp : vec_src)
vec_dst.emplace_back(move(sp));
}
int main (int arg, char **argv){
vector<shared_ptr<int>> vec;
for (int i = 0; i < COUNT; ++i)
vec.emplace_back(new int);
benchmark_copy(vec);
benchmark_move(move(vec));
}
Upvotes: 1
Reputation: 5265
Since none of these answers offered an actual benchmark, I thought I'd try to provide one. However, think I've left myself more confused than when I started. I tried to come up with a test that would measure passing a shared_ptr<int>
by value, by reference, and using std::move
, performing an add operation on that value, and returning the result. I did this several times (one million) using two sets of tests. The first set added a constant value to the shared_ptr<int>
, the other added a random value in the [0, 10] range. I figured the constant value addition would be a candidate for heavy optimization, whereas the random value test would not. That is more-or-less what I saw, but the extreme differences in execution time leads me to believe that other factors/problems with this test program are the contributing factors to the execution time differences, not the move semantics.
For no optimizations (-O0
), constant addition
std::move
was ~4x faster than pass-by-valuestd::move
was marginally slower than pass-by-referenceFor high optimizations (-O3
), constant addition
std::move
was 70-90 thousand times faster than pass-by-valuestd::move
was marginally faster than pass-by-reference (anywhere from 1-1.4 times)For no optimizations (-O0
), random addition
std::move
was 1-2 times faster than pass-by-valuestd::move
was marginally slower than pass-by-referenceFor high optimizations (-O3
), random addition
std::move
was 1-1.3 times faster than pass-by-value (marginally worse than no optimizations)std::move
was essentially the same as pass-by-referenceFinally, the test
#include <memory>
#include <iostream>
#include <chrono>
#include <ctime>
#include <random>
constexpr auto MAX_NUM_ITS = 1000000;
// using random values to try to cut down on massive compiler optimizations
static std::random_device RAND_DEV;
static std::mt19937 RNG(RAND_DEV());
static std::uniform_int_distribution<std::mt19937::result_type> DIST11(0,10);
void CopyPtr(std::shared_ptr<int> myInt)
{
// demonstrates that use_count increases with each copy
std::cout << "In CopyPtr: ref count = " << myInt.use_count() << std::endl;
std::shared_ptr<int> myCopyInt(myInt);
std::cout << "In CopyPtr: ref count = " << myCopyInt.use_count() << std::endl;
}
void ReferencePtr(std::shared_ptr<int>& myInt)
{
// reference count stays the same until a copy is made
std::cout << "In ReferencePtr: ref count = " << myInt.use_count() << std::endl;
std::shared_ptr<int> myCopyInt(myInt);
std::cout << "In ReferencePtr: ref count = " << myCopyInt.use_count() << std::endl;
}
void MovePtr(std::shared_ptr<int>&& myInt)
{
// demonstrates that use_count remains constant with each move
std::cout << "In MovePtr: ref count = " << myInt.use_count() << std::endl;
std::shared_ptr<int> myMovedInt(std::move(myInt));
std::cout << "In MovePtr: ref count = " << myMovedInt.use_count() << std::endl;
}
int CopyPtrFastConst(std::shared_ptr<int> myInt)
{
return 5 + *myInt;
}
int ReferencePtrFastConst(std::shared_ptr<int>& myInt)
{
return 5 + *myInt;
}
int MovePtrFastConst(std::shared_ptr<int>&& myInt)
{
return 5 + *myInt;
}
int CopyPtrFastRand(std::shared_ptr<int> myInt)
{
return DIST11(RNG) + *myInt;
}
int ReferencePtrFastRand(std::shared_ptr<int>& myInt)
{
return DIST11(RNG) + *myInt;
}
int MovePtrFastRand(std::shared_ptr<int>&& myInt)
{
return DIST11(RNG) + *myInt;
}
void RunConstantFunctions(std::shared_ptr<int> myInt)
{
std::cout << "\nIn constant funcs, ref count = " << myInt.use_count() << std::endl;
// demonstrates speed of each function
int sum = 0;
// Copy pointer
auto start = std::chrono::steady_clock::now();
for (auto i=0; i<MAX_NUM_ITS; i++)
{
sum += CopyPtrFastConst(myInt);
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> copyElapsed = end - start;
std::cout << "CopyPtrConst sum = " << sum << ", took " << copyElapsed.count() << " seconds.\n";
// pass pointer by reference
sum = 0;
start = std::chrono::steady_clock::now();
for (auto i=0; i<MAX_NUM_ITS; i++)
{
sum += ReferencePtrFastConst(myInt);
}
end = std::chrono::steady_clock::now();
std::chrono::duration<double> refElapsed = end - start;
std::cout << "ReferencePtrConst sum = " << sum << ", took " << refElapsed.count() << " seconds.\n";
// pass pointer using std::move
sum = 0;
start = std::chrono::steady_clock::now();
for (auto i=0; i<MAX_NUM_ITS; i++)
{
sum += MovePtrFastConst(std::move(myInt));
}
end = std::chrono::steady_clock::now();
std::chrono::duration<double> moveElapsed = end - start;
std::cout << "MovePtrConst sum = " << sum << ", took " << moveElapsed.count() <<
" seconds.\n";
std::cout << "std::move vs pass by value: " << copyElapsed / moveElapsed << " times faster.\n";
std::cout << "std::move vs pass by ref: " << refElapsed / moveElapsed << " times faster.\n";
}
void RunRandomFunctions(std::shared_ptr<int> myInt)
{
std::cout << "\nIn random funcs, ref count = " << myInt.use_count() << std::endl;
// demonstrates speed of each function
int sum = 0;
// Copy pointer
auto start = std::chrono::steady_clock::now();
for (auto i=0; i<MAX_NUM_ITS; i++)
{
sum += CopyPtrFastRand(myInt);
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> copyElapsed = end - start;
std::cout << "CopyPtrRand sum = " << sum << ", took " << copyElapsed.count() << " seconds.\n";
// pass pointer by reference
sum = 0;
start = std::chrono::steady_clock::now();
for (auto i=0; i<MAX_NUM_ITS; i++)
{
sum += ReferencePtrFastRand(myInt);
}
end = std::chrono::steady_clock::now();
std::chrono::duration<double> refElapsed = end - start;
std::cout << "ReferencePtrRand sum = " << sum << ", took " << refElapsed.count() << " seconds.\n";
// pass pointer using std::move
sum = 0;
start = std::chrono::steady_clock::now();
for (auto i=0; i<MAX_NUM_ITS; i++)
{
sum += MovePtrFastRand(std::move(myInt));
}
end = std::chrono::steady_clock::now();
std::chrono::duration<double> moveElapsed = end - start;
std::cout << "MovePtrRand sum = " << sum << ", took " << moveElapsed.count() <<
" seconds.\n";
std::cout << "std::move vs pass by value: " << copyElapsed / moveElapsed << " times faster.\n";
std::cout << "std::move vs pass by ref: " << refElapsed / moveElapsed << " times faster.\n";
}
int main()
{
// demonstrates how use counts are effected between copy and move
std::shared_ptr<int> myInt = std::make_shared<int>(5);
std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
CopyPtr(myInt);
std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
ReferencePtr(myInt);
std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
MovePtr(std::move(myInt));
std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
// since myInt was moved to MovePtr and fell out of scope on return (was destroyed),
// we have to reinitialize myInt
myInt.reset();
myInt = std::make_shared<int>(5);
RunConstantFunctions(myInt);
RunRandomFunctions(myInt);
return 0;
}
I noticed that for -O0
and -O3
, the constant functions both compiled to the same assembly for both sets of flags, both relatively short blocks. This makes me think a majority of the optimization comes from the calling code, but I'm not really seeing that in my amateur assembly knowledge.
The random functions compiled to quite a bit of assembly, even for -O3
, so the random part must be dominating that routine.
So in the end, not really sure what to make of this. Please throw darts at it, tell me what I did wrong, offer some explanations.
Upvotes: 10
Reputation: 38156
At least with libstdc++ you should get the same performance with move and assignment because operator=
calls std::move
on the incoming pointer. See: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/shared_ptr.h#L384
Upvotes: 4
Reputation: 26486
I think that the one thing the other answers did not emphasize enough is the point of speed.
std::shared_ptr
reference count is atomic. increasing or decreasing the reference count requires atomic increment or decrement. This is hundred times slower than non-atomic increment/decrement, not to mention that if we increment and decrement the same counter we wind up with the exact number, wasting a ton of time and resources in the process.
By moving the shared_ptr
instead of copying it, we "steal" the atomic reference count and we nullify the other shared_ptr
. "stealing" the reference count is not atomic, and it is hundred times faster than copying the shared_ptr
(and causing atomic reference increment or decrement).
Do note that this technique is used purely for optimization. copying it (as you suggested) is just as fine functionality-wise.
Upvotes: 248
Reputation: 29966
Copying a shared_ptr
involves copying its internal state object pointer and changing the reference count. Moving it only involves swapping pointers to the internal reference counter, and the owned object, so it's faster.
Upvotes: 26
Reputation: 4420
There are two reasons for using std::move in this situation. Most responses addressed the issue of speed, but ignored the important issue of showing the code's intent more clearly.
For a std::shared_ptr, std::move unambiguously denotes a transfer of ownership of the pointee, while a simple copy operation adds an additional owner. Of course, if the original owner subsequently relinquishes their ownership (such as by allowing their std::shared_ptr to be destroyed), then a transfer of ownership has been accomplished.
When you transfer ownership with std::move, it's obvious what is happening. If you use a normal copy, it isn't obvious that the intended operation is a transfer until you verify that the original owner immediately relinquishes ownership. As a bonus, a more efficient implementation is possible, since an atomic transfer of ownership can avoid the temporary state where the number of owners has increased by one (and the attendant changes in reference counts).
Upvotes: 29
Reputation: 42944
Move operations (like move constructor) for std::shared_ptr
are cheap, as they basically are "stealing pointers" (from source to destination; to be more precise, the whole state control block is "stolen" from source to destination, including the reference count information).
Instead copy operations on std::shared_ptr
invoke atomic reference count increase (i.e. not just ++RefCount
on an integer RefCount
data member, but e.g. calling InterlockedIncrement
on Windows), which is more expensive than just stealing pointers/state.
So, analyzing the ref count dynamics of this case in details:
// shared_ptr<CompilerInvocation> sp;
compilerInstance.setInvocation(sp);
If you pass sp
by value and then take a copy inside the CompilerInstance::setInvocation
method, you have:
shared_ptr
parameter is copy constructed: ref count atomic increment.shared_ptr
parameter into the data member: ref count atomic increment.shared_ptr
parameter is destructed: ref count atomic decrement.You have two atomic increments and one atomic decrement, for a total of three atomic operations.
Instead, if you pass the shared_ptr
parameter by value and then std::move
inside the method (as properly done in Clang's code), you have:
shared_ptr
parameter is copy constructed: ref count atomic increment.std::move
the shared_ptr
parameter into the data member: ref count does not change! You are just stealing pointers/state: no expensive atomic ref count operations are involved.shared_ptr
parameter is destructed; but since you moved in step 2, there's nothing to destruct, as the shared_ptr
parameter is not pointing to anything anymore. Again, no atomic decrement happens in this case.Bottom line: in this case you get just one ref count atomic increment, i.e. just one atomic operation.
As you can see, this is much better than two atomic increments plus one atomic decrement (for a total of three atomic operations) for the copy case.
Upvotes: 83
Reputation: 92261
By using move
you avoid increasing, and then immediately decreasing, the number of shares. That might save you some expensive atomic operations on the use count.
Upvotes: 150