Yuushi
Yuushi

Reputation: 26040

volatile and multithreading on platform with strong memory models

Almost any time we're dealing with multithreading, volatile is misused (at least for any cross-platform code). I'm curious however if using a volatile bool would give (non-deterministic), but correct results, when used on an architecture such as x86(-64) where (single) loads and stores on (correctly aligned) word-sized values are atomic (hence a read cannot see an intermediate value; it will always see either true or false).

As an example, consider something like the following multithreaded linear search:

#include <algorithm>
#include <future>
#include <thread>
#include <vector>

namespace detail
{

template <typename Iterator>
std::vector<std::pair<Iterator, Iterator>> split_range(
    Iterator begin, Iterator end, unsigned nranges, std::random_access_iterator_tag
)
{
    std::vector<std::pair<Iterator, Iterator>> split(nranges);
    auto diff = end - begin;
    for(auto i = 0U; i < nranges - 1; ++i) {
        split[i] = std::make_pair(begin, begin + (diff / nranges));
        begin += (diff / nranges);
    }
    split[nranges - 1] = std::make_pair(begin, end);
    return split;
}

template <typename Iterator> 
Iterator async_find_impl(
    std::pair<Iterator, Iterator> range,
    typename std::iterator_traits<Iterator>::value_type value,
    volatile bool* finished
)
{
    auto begin = range.first;
    auto end = range.second;

    while(!(*finished) && (begin != end)) {
        if(*begin == value) {
            *finished = true;
            return begin;
        }
        ++begin;
    }
    return end;
}

}

template <typename Iterator>
std::vector<std::pair<Iterator, Iterator>> split_range(
    Iterator begin, Iterator end, unsigned ranges
)
{
    return detail::split_range(
        begin, end, ranges, 
        typename std::iterator_traits<Iterator>::iterator_category()
    );
}


template <typename Iterator>
Iterator async_find(
    Iterator begin, Iterator end, 
    typename std::iterator_traits<Iterator>::value_type value
)
{
    volatile bool found = false;
    const static auto default_launch = 4U;
    unsigned to_launch = std::thread::hardware_concurrency();
    if(to_launch == 0) { to_launch = default_launch; }

    std::vector<std::future<Iterator>> futures;
    auto ranges = split_range(begin, end, to_launch);

    for(auto&& range : ranges) {
        auto func = [&]() { return detail::async_find_impl(range, value, &found); };
        futures.emplace_back(std::async(func));
    }

    std::vector<Iterator> results;
    for(auto&& future : futures) {
        results.emplace_back(future.get());
    }

    for(auto i = 0U; i < results.size(); ++i) {
        if(results[i] != ranges[i].second) {
            return results[i];
        }
    }

    return end;
}

Obviously, the correct cross platform way of doing something such as this is to use a std::atomic<bool>. However, given the assumptions listed above, would this be "correct" (where correct is returning any Iterator that points to the given value; if there are multiple possibilities, essentially it is non-deterministic which one will be returned)?

Upvotes: 0

Views: 105

Answers (1)

Pete Becker
Pete Becker

Reputation: 76245

Atomics in C and C++ address three issues: tearing, reordering, and visibility. Tearing occurs when a thread switch happens in the middle of a read or write that requires more than one bus cycle (e.g., writing a 64-bit value when running on a 32-bit processor), and the newly-running thread writes or reads (respectively) that value. Reordering can be done by the compiler and by the processor. Visibility issues come up because each processor has its own local data cache; writes from one processor are not visible to another processor until the new values have been written to main memory and then read into the other processor's local cache.

A bool value is probably stored in a chunk of memory that's small enough that it won't get torn, although that's not required. volatile tells the compiler not to reorder reads and writes (and not to elide them). Neither of those addresses visibility. So using volatile bool might give you the result you want, although without any assurance of when the updated value might be seen. But why bother? Instead of hand-rolling your own half measures, just use atomic<bool>. It takes care of all three issues, and will be written to take advantage of the underlying hardware, perhaps in ways that you haven't thought of.

Upvotes: 2

Related Questions