Kulluk007
Kulluk007

Reputation: 972

Data race with std::unordered_map, despite locking insertions with mutex

I have a C++11 program that does some computations and uses a std::unordered_map to cache results of those computations. The program uses multiple threads and they use a shared unordered_map to store and share the results of the computations.

Based on my reading of unordered_map and STL container specs, as well as unordered_map thread safety, it seems that an unordered_map, shared by multiple threads, can handle one thread writing at a time, but many readers at a time.

Therefore, I'm using a std::mutex to wrap my insert() calls to the map, so that at most only one thread is inserting at a time.

However, my find() calls do not have a mutex as, from my reading, it seems that many threads should be able to read at once. However, I'm occasionally getting data races (as detected by TSAN), manifesting themselves in a SEGV. The data race clearly points to the insert() and find() calls that I mentioned above.

When I wrap the find() calls in a mutex, the problem goes away. However, I don't want to serialize the concurrent reads, as I'm trying to make this program as fast as possible. (FYI: I'm running using gcc 5.4.)

Why is this happening? Is my understanding of the concurrency guarantees of std::unordered_map incorrect?

Upvotes: 9

Views: 4655

Answers (2)

David Haim
David Haim

Reputation: 26496

There are several problems with that approach.

first, std::unordered_map has two overloads of find - one which is const, and one which is not.
I'd dare to say that I don't believe that that non-const version of find will mutate the map, but still for the compiler invoking non const method from a multiple threads is a data race and some compilers actually use undefined behavior for nasty optimizations.
so first thing - you need to make sure that when multiple threads invoke std::unordered_map::find they do it with the const version. that can be achieved by referencing the map with a const reference and then invoking find from there.

second, you miss the the part that many thread may invoke const find on your map, but other threads can not invoke non const method on the object! I can definitely imagine many threads call find and some call insert on the same time, causing a data race. imagine that, for example, insert makes the map's internal buffer reallocate while some other thread iterates it to find the wanted pair.

a solution to that is to use C++14 shared_mutex which has an exclusive/shared locking mode. when thread call find, it locks the lock on shared mode, when a thread calls insert it locks it on exclusive lock.

if your compiler does not support shared_mutex, you can use platform specific synchronization objects, like pthread_rwlock_t on Linux and SRWLock on Windows.

another possibility is to use lock-free hashmap, like the one provided by Intel's thread-building blocks library, or concurrent_map on MSVC concurrency runtime. the implementation itself uses lock-free algorithms which makes sure access is always thread-safe and fast on the same time.

Upvotes: 2

Galik
Galik

Reputation: 48625

You still need a mutex for your readers to keep the writers out, but you need a shared one. C++14 has a std::shared_timed_mutex that you can use along with scoped locks std::unique_lock and std::shared_lock like this:

using mutex_type = std::shared_timed_mutex;
using read_only_lock  = std::shared_lock<mutex_type>;
using updatable_lock = std::unique_lock<mutex_type>;

mutex_type mtx;
std::unordered_map<int, std::string> m;

// code to update map
{
    updatable_lock lock(mtx);

    m[1] = "one";
}

// code to read from map
{
    read_only_lock lock(mtx);

    std::cout << m[1] << '\n';
}

Upvotes: 8

Related Questions