Reputation: 30028
Note: this is not a practical problem(I have never locked more than 2 mutexes with scoped_lock), I am mostly curious why implementations of scoped_lock apparently have a relatively big performance hit when locking mutexes in different order.
Example code below, godbolt link.
#include<mutex>
#include<thread>
#include<chrono>
#include<iostream>
std::mutex m1, m2, m3, m4, m5, m6;
int cnt =0;
void f(){
for (int i=0; i< 500*1000; ++i){
std::scoped_lock sl{m1, m2, m3, m4, m5, m6};
cnt++;
}
}
void f_unord(){
for (int i=0; i< 500*1000; ++i){
std::scoped_lock sl{m4, m5, m6, m1, m2, m3};
cnt++;
}
}
int main(){
for (int run = 0; run<4; ++run)
{
{
const auto start = std::chrono::steady_clock::now();
std::thread t1(f), t2(f);
t1.join();
t2.join();
const auto end = std::chrono::steady_clock::now();
std::cout << "same lock order: " << std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count() << std::endl;
std::cout << cnt << std::endl;
}
{
const auto start = std::chrono::steady_clock::now();
std::thread t1(f), t2(f_unord);
t1.join();
t2.join();
const auto end = std::chrono::steady_clock::now();
std::cout << "different lock order: " << std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count() << std::endl;
std::cout << cnt << std::endl;
}
}
}
note on why this is surprising: I would expect that since mutexes are not movable for implementations to just sort the mutexes by address and use that order of locking.
note on benchmarking on godbolt: I know godbolt is not reliable, I get similar results on my machine in VM:
g++ --version; g++ -O2 -std=c++17 scoped_lock.cpp -pthread; ./a.out
g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
different lock order: 1074
1000000
same lock order: 602
2000000
different lock order: 987
3000000
same lock order: 612
4000000
different lock order: 1012
5000000
same lock order: 585
6000000
different lock order: 1050
7000000
same lock order: 675
8000000
different lock order: 1107
9000000
same lock order: 609
10000000
Upvotes: 2
Views: 670
Reputation: 603
It is tied to the implementation. We can imagine that std::scoped_lock is using std::lock in some regular implementations.
When you look at the std::lock doc:
The objects are locked by an unspecified series of calls to lock, try_lock, and unlock. If a call to lock or unlock results in an exception, unlock is called for any locked objects before rethrowing
gcc implementation of std::lock
is:
void
lock(_L1& __l1, _L2& __l2, _L3&... __l3)
{
while (true)
{
using __try_locker = __try_lock_impl<0, sizeof...(_L3) != 0>;
unique_lock<_L1> __first(__l1);
int __idx;
auto __locks = std::tie(__l2, __l3...);
__try_locker::__do_try_lock(__locks, __idx);
if (__idx == -1)
{
__first.release();
return;
}
}
}
As you can see, if you have the same declaration order, this is simple:
In the second case, it can take some time to stabilize (it could never happen theoretically)...
Upvotes: 0
Reputation: 218750
As others have stated, it is tied to the implementation. But the implementation can do better than gcc's persistent implementation of trying the same thing over and over again.
Peristent
Lock the first lock and then try_lock the rest. If any of the try_locks fail, unlock everything and try again.
This algorithm works best if both threads list their mutexes in the same order.
For a higher performing and more robust algorithm, the implementation should use what this paper calls Smart & Polite.
Smart & Polite
Lock the first lock and then try_lock the rest. If any of the try_locks fail, unlock everything, then yield, then retry except the first lock is done on the one that previously failed the try_lock.
The paper shows that this algorithm never performs worse than any other, and often performs much better. This analysis includes the more traditional algorithm that sorts the lockables into a global order, and then locks them in that order (therein labeled Ordered).
libc++ and Visual Studio both use Smart & Polite. gcc's libstdc++ uses Persistent.
When using clang on non-Apple platforms, use -stdlib=libc++
to choose libc++ over gcc's std::lib.
Read Dining Philosophers Rebooted for an indepth performance analysis of these algorithms for std::lock
.
Upvotes: 5
Reputation: 889
When both threads use the same mutex order, no deadlock can occur. Thread t2
can only proceed with lock m1
if thread t1
has not yet locked m1
and vice-versa. No deadlock can occur.
In the case that you use a different order for the two threads a deadlock would occur. That is thread t1
has locked m1
, m2
, and m3
, and tries to lock m4
, m5
, and m6
. However, at the same time thread t2
has locked m4
, m5
, and m6
and tries to lock m1
, m2
, and m3
. The two threads cannot proceed and the deadlock needs to be resolved.
In this case, either scoped lock has to release the mutexes it has already acquired in order to avoid a deadlock. Then the other thread can continue and then the thread has to again acquire all the mutexes and with the next iteration the same happens again.
Upvotes: 2