The code below is used to assign work to multiple threads, wake them up, and wait until they are done. The "work" in this case consists of "cleaning a volume". What exactly this operation does is irrelevant for this question -- it just helps with the context. The code is part of a huge transaction processing system. void bf_tree_cleaner::force_all() { for (int i = 0; i < vol_m::MAX_VOLS; i++) { _requested_volumes[i] = true; } // fence here (seq_cst) wakeup_cleaners(); while (true) { usleep(10000); // 10 ms bool remains = false; for (int vol = 0; vol < vol_m::MAX_VOLS; ++vol) { // fence here (seq_cst) if (_requested_volumes[vol]) { remains = true; break; } } if (!remains) { break; } } } A value in a boolean array _requested_volumes[i] tells whether thread i has work to do. When it is done, the worker thread sets it to false and goes back to sleep. The problem I am having is that the compiler generates an infinite loop, where the variable remains is always true, even though all values in the array have been set to false. This only happens with -O3 . I have tried two solutions to fix that: Declare _requested_volumes volatile ( EDIT : this solution does work actually. See edit below) Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses. But there's a lot of dispute over this on the Internet. The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access . In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming. Introduce memory fences The method wakeup_cleaners() acquires a pthread_mutex_t internally in order to set a wake-up flag in the worker threads, so it should implicitly produce proper memory fences. But I'm not sure if those fences affect memory accesses in the caller method ( force_all() ). Therefore, I manually introduced fences in the locations specified by the comments above. This should make sure that writes performed by the worker thread in _requested_volumes are visible in the main thread. What puzzles me is that none of these solutions works, and I have absolutely no idea why. The semantics and proper use of memory fences and volatile is confusing me right now. The problem is that the compiler is applying an undesired optimization -- hence the volatile attempt. But it could also be a problem of thread synchronization -- hence the memory fence attempt. I could try a third solution in which a mutex protects every access to _requested_volumes , but even if that works, I would like to understand why, because as far as I understand, it's all about memory fences. Thus, it should make no difference whether it's done explicitly or implicitly via a mutex. EDIT: My assumptions were wrong and Solution 1 actually does work. However, my question remains in order to clarify the use of volatile vs. memory fences. If volatile is such a bad thing, that should never be used in multithreaded programming, what else should I use here? Do memory fences also affect compiler optimizations? Because I see these as two orthogonal issues, and therefore orthogonal solutions: fences for visibility in multiple threads and volatile for preventing optimizations.

c++multithreadinggccvolatilememory-fences

Reputation: 273

Volatile vs. memory fences

The code below is used to assign work to multiple threads, wake them up, and wait until they are done. The "work" in this case consists of "cleaning a volume". What exactly this operation does is irrelevant for this question -- it just helps with the context. The code is part of a huge transaction processing system.

void bf_tree_cleaner::force_all()
{
    for (int i = 0; i < vol_m::MAX_VOLS; i++) {
        _requested_volumes[i] = true;
    }
    // fence here (seq_cst)

    wakeup_cleaners();

    while (true) {
        usleep(10000); // 10 ms

        bool remains = false;
        for (int vol = 0; vol < vol_m::MAX_VOLS; ++vol) {
            // fence here (seq_cst)
            if (_requested_volumes[vol]) {
                remains = true;
                break;
            }
        }
        if (!remains) {
            break;
        }
    }
}

A value in a boolean array _requested_volumes[i] tells whether thread i has work to do. When it is done, the worker thread sets it to false and goes back to sleep.

The problem I am having is that the compiler generates an infinite loop, where the variable remains is always true, even though all values in the array have been set to false. This only happens with -O3.

I have tried two solutions to fix that:

Declare _requested_volumes volatile (EDIT: this solution does work actually. See edit below)

Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses. But there's a lot of dispute over this on the Internet. The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access. In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.

Introduce memory fences

The method wakeup_cleaners() acquires a pthread_mutex_t internally in order to set a wake-up flag in the worker threads, so it should implicitly produce proper memory fences. But I'm not sure if those fences affect memory accesses in the caller method (force_all()). Therefore, I manually introduced fences in the locations specified by the comments above. This should make sure that writes performed by the worker thread in _requested_volumes are visible in the main thread.

What puzzles me is that none of these solutions works, and I have absolutely no idea why. The semantics and proper use of memory fences and volatile is confusing me right now. The problem is that the compiler is applying an undesired optimization -- hence the volatile attempt. But it could also be a problem of thread synchronization -- hence the memory fence attempt.

I could try a third solution in which a mutex protects every access to _requested_volumes, but even if that works, I would like to understand why, because as far as I understand, it's all about memory fences. Thus, it should make no difference whether it's done explicitly or implicitly via a mutex.

EDIT: My assumptions were wrong and Solution 1 actually does work. However, my question remains in order to clarify the use of volatile vs. memory fences. If volatile is such a bad thing, that should never be used in multithreaded programming, what else should I use here? Do memory fences also affect compiler optimizations? Because I see these as two orthogonal issues, and therefore orthogonal solutions: fences for visibility in multiple threads and volatile for preventing optimizations.

Upvotes: 1

Answers (2)

Useless

Reputation: 67733

Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses.

Yes.

But there's a lot of dispute over this on the Internet.

Not, generally, between "the experts".

The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access.

Nope.

Non-pure, non-constexpr non-inlined function calls (getters/accessors) also necessarily have this effect. Admittedly link-time optimization confuses the issue of which functions may really get inlined.

In C, and by extension C++, volatile affects memory access optimization. Java took this keyword, and since it can't (or couldn't) do the tasks C uses volatile for in the first place, altered it to provide a memory fence.

The correct way to get the same effect in C++ is using std::atomic.

In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.

No, it may have the desired effect, depending on how it interacts with your platform's cache hardware. This is brittle - it could change any time you upgrade a CPU, or add another one, or change your scheduler behaviour - and it certainly isn't portable.

If you're really just tracking how many workers are still working, sane methods might be a semaphore (synchronized counter), or mutex+condvar+integer count. Either are likely more efficient than busy-looping with a sleep.

If you're wedded to the busy loop, you could still reasonably have a single counter, such as std::atomic<size_t>, which is set by wakeup_cleaners and decremented as each cleaner completes. Then you can just wait for it to reach zero.

If you really want a busy loop and really prefer to scan the array each time, it should be an array of std::atomic<bool>. That way you can decide what consistency you need from each load, and it will control both the compiler optimizations and the memory hardware appropriately.

Upvotes: 5

Serge Rogatch

Reputation: 15030

Apparently, volatile does the necessary for your example. The topic of volatile qualifier itself is too broad: you can start by searching "C++ volatile vs atomic" etc. There are a lot of articles and questions&answers on the internet, e.g. Concurrency: Atomic and volatile in C++11 memory model . Briefly, volatile tells the compiler to disable some aggressive optimizations, particularly, to read the variable each time it is accessed (rather than storing it in a register or cache). There are compilers which do more so making volatile to act more like std::atomic: see Microsoft Specific section here. In your case disablement of an aggressive optimization is exactly what was necessary.

However, volatile doesn't define the order for the execution of the statements around it. That is why you need memory order in case you need to do something else with the data after the flags you check have been set. For inter-thread communication it is appropriate to use std::atomic, particularly, you need to refactor _requested_volumes[vol] to be of type std::atomic<bool> or even std::atomic_flag: http://en.cppreference.com/w/cpp/atomic/atomic .

An article that discourages usage of volatile and explains that volatile can be used only in rare special cases (connected with hardware I/O): https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

Upvotes: 0

Volatile vs. memory fences

Answers (2)

Related Questions