Are compiler optimization solving thread safety issues?

Question

I'm writing a C++ multi-threaded code. When testing the overhead of different mutex lock I found that the thread unsafe code seem to yield the correct result compiled with Release Configuration in Visual Studio but much faster than the code with mutex lock. However with Debug Configuration the result is what I expected. I was wondering if it's the compiler that solved this or it's just because the code compiled in Release configuration runs so fast that two threads never accesses the memory in the same time?

My test code is pasted as following.

class Mutex {
public:
unsigned long long  _data;

bool tryLock() {
    return mtx.try_lock();
}

inline void Lock() {
    mtx.lock();
}
inline void Unlock() {
    mtx.unlock();
}
void safeSet(const unsigned long long &data) {
    Lock();
    _data = data;
    Unlock();
}
Mutex& operator++ () {
    Lock();
    _data++;
    Unlock();
    return (*this);
}
Mutex operator++(int) {
    Mutex tmp = (*this);
    Lock();
    _data++;
    Unlock();
    return tmp;
}
Mutex() {
    _data = 0;
}
 private:
std::mutex mtx;
Mutex(Mutex& cpy) {
    _data = cpy._data;
}
}val;

static DWORD64 val_unsafe = 0;
DWORD WINAPI safeThreads(LPVOID lParam) {
for (int i = 0; i < 655360;i++) {
    ++val;
}
return 0;
}
DWORD WINAPI unsafeThreads(LPVOID lParam) {
for (int i = 0; i < 655360; i++) {
    val_unsafe++;
}
return 0;
}

int main()
{
val._data = 0;
vector hThreads;
LARGE_INTEGER freq, time1, time2;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&time1);
for (int i = 0; i < 32; i++) {
    hThreads.push_back( CreateThread(0, 0, safeThreads, 0, 0, 0));
}
for each(HANDLE handle in hThreads)
{
    WaitForSingleObject(handle, INFINITE);
}
QueryPerformanceCounter(&time2);
cout<

Christophe · Accepted Answer

The standard doesn't let you assume that code is thread safe by default. Your code gives nevertheless correct result when compiled in release mode for x64.

But why ?

If you look at the assembler file generated for your code, you'll find out that that the optimizer simply unrolled the loop and applied constant propagation. So instead of looping 65535 times, it just adds a constant to your counter :

?unsafeThreads@@YAKPEAX@Z PROC              ; unsafeThreads, COMDAT
; 74   :    for (int i = 0; i < 655360; i++) {
    add QWORD PTR ?val_unsafe@@3_KA, 655360 ; 000a0000H   <======= HERE 
; 75   :        val_unsafe++;
; 76   :    }
; 77   :    return 0;
    xor eax, eax                             
; 78   : }

In this situation, with a single and very fast instruction in each thread, it's much less probable to get a data race : most probably one thread is already finished before the next is launched.

How to see the expected result from your benchmark ?

If you want to avoid the optimizer to unroll your test loops, you need to declare _data and unsafe_val as volatile. You'll then notice that the unsafe value is no longer correct due to data races. Running my own tests with this modified code, I get the correct value for the safe version, and always different (and wrong) values for the unsafe version. For example:

safe time:5672583
unsafe time:145092                   // <=== much faster
val:20971520
val_unsafe:3874844                   // <=== OUCH !!!!
freq: 2597654

Want to make your unsafe code safe ?

If you want to make your unsafe code safe but without using an explicit mutex, you could just make unsafe_val an atomic. The result will be platform dependent (the implementation could very well introduce a mutex for you) but on the same machine than above, with MSVC15 in release mode, I get:

safe time:5616282
unsafe time:798851                    // still much faster (6 to 7 times in average)
val:20971520
val_unsafe:20971520                   // but always correct
freq2597654

The only thing that you then still must do : rename the atomic version of the variable from unsafe_val into also_safe_val ;-)

Are compiler optimization solving thread safety issues?

Answers (1)

Related Questions