c++: spin lock or mutex comparison (simple calculations)

Question

Spin lock should have better performance than mutex for simple tasks. However, in this simple test (8 threads incrementing a counter), the results shows differently:

#include 
#include 
#include 
#include 
#include 

using namespace std;

class SpinLock {
private:
    atomic_flag lck = ATOMIC_FLAG_INIT;
public:
    void lock() { while(lck.test_and_set(memory_order_acquire)) {} }
    void unlock() { lck.clear(memory_order_release); }
};

int total = 0;

#ifdef SPINLOCK
SpinLock my_lock;
#else
mutex my_lock;
#endif

void foo(int n)
{
    for(int i = 0; i < 10000000; ++i) {
#ifdef SPINLOCK
        lock_guard lck(my_lock);
#else
        lock_guard lck(my_lock);
#endif
        ++total;
    }
}

int main()
{
    vector v;

    for(int i = 0; i < 8; ++i)
        v.emplace_back(foo, i);

    for(auto& t : v)
        t.join();

    cout << "total: " << total << endl;
    return 0;
}

To test spin lock:

$ g++ -DSPINLOCK -std=c++11 -Wall -pthread test.cc
$ time ./a.out
total: 80000000
real    0m18.206s
user    2m17.792s
sys     0m0.003s

To test mutex:

$ g++ -std=c++11 -Wall -pthread test.cc
$ time ./a.out
total: 80000000
real    0m9.483s
user    0m6.451s
sys     1m6.043s

The results show the mutex is almost two times faster than the spin lock. The spin lock spends most time in "user cpu" and the mutex spends most time in "sys cpu". How is the mutex implemented and should I use mutex instead of spin lock in the simple calculations like this? Can anyone explain the results?

The g++ is 4.8.2 and OS is Red Hat Enterprise Linux 7.

Thanks.

pqnet · Accepted Answer

some notes:

the time shown in time utility output is the CPU time your threads are using, not actual time. Spinlock uses cpu even during waiting, while kernel mutexes will execute other threads in other processes while waiting, not billing your process for that CPU time except for the one used to actually do the scheduling (the one you see in sys row in the mutex case).
for the same reason said above it may be that the total time you have to wait from the start of the process to the end is faster in the spinlock case, however your cpu may have higher usage, which is what you observed.
threading may be a good choice if you have a low collision chance, that is the time a thread spends in synchronized load is small compared to the load it can do asynchronously. If all your load is protected by a mutex there is only overhead in using threads at all - you should probably serialize it.

Spinlock are good if you have both low collision chance and low waiting time in case of collision. In your case you have 8 threads colliding to the same resource, and then requiring the resource to be available just as soon as you released it. This means that in average there will be 1 thread working and other 7 spinlocking, putting the total CPU time used at 8 times the needed time for single thread (if you have an 8 core machine and there is no other load on it). In mutex case thread are paused and woken up just when the resource is available, so there is no overhead for waiting, however locking a mutex will require some overhead as work to do in the kernel to keep track of which processes are waiting for the mutex, and even if it is not too big you are doing 160 million mutex operations, which totals to the sys time billed to your process

c++: spin lock or mutex comparison (simple calculations)

Answers (1)

Related Questions