Reputation: 20396
In my code, I dispatch tasks to my thread pool with a construct that more-or-less looks like this:
working_data get_data(my_thread_pool & thread_pool) {
size_t thread_pool_size = thread_pool.size();
std::vector<working_data> data(thread_pool_size);
std::vector<std::promise<void>> promises(thread_pool_size);
std::mutex data_0_mutex;
for(size_t i = 0; i < thread_pool_size; i++) {
thread_pool.post_task([&, i] {
std::unique_lock<std::mutex> lock(data_0_mutex, std::defer_lock);
if(i == 0)
lock.lock();
data[i].add_data(process_data());
if(i != 0) {
lock.lock();
data[0].merge_from(data[i]);
}
promises[i].set_value();
});
}
for(size_t i = 0; i < thread_pool_size; i++) {
promises[i].get_future().wait();
}
return std::move(data[0]);
}
While this doesn't happen every time I execute this code, many times when executing it, this code causes an access violation.
Conversely, the following code never causes access violations, but I'd rather not use it because I don't like the use of the polling loop at the end:
working_data get_data(my_thread_pool & thread_pool) {
size_t thread_pool_size = thread_pool.size();
std::vector<working_data> data(thread_pool_size);
std::vector<std::atomic_bool> flags(thread_pool_size);
std::mutex data_0_mutex;
for(size_t i = 0; i < thread_pool_size; i++) {
thread_pool.post_task([&, i] {
std::unique_lock<std::mutex> lock(data_0_mutex, std::defer_lock);
if(i == 0)
lock.lock();
data[i].add_data(process_data());
if(i != 0) {
lock.lock();
data[0].merge_from(data[i]);
}
flags[i].store(true, std::relaxed_memory_order);
});
}
for(size_t i = 0; i < thread_pool_size; i++) {
while(!flags[i].load(std::relaxed_memory_order))
std::this_thread::yield();
}
return std::move(data[0]);
}
Note that the main difference is that I'm using std::atomic_bool
instead of std::promise
and std::future
.
It's quite possible that the error is being caused somewhere else in my [2000+ lines of] code, but I'd like to know if there's an obvious mistake solely in the code I'm presenting here.
One other thing I've observed: I cannot recreate the bug if I turn optimizations off. This bug only occurs when optimizations are turned on.
The call stack for where the access violation occurs is unreadable:
Upvotes: 2
Views: 603
Reputation: 32732
The problem is one of timing.
When the last thread sets its promise's value, this allows the loop in the main thread to finish, which in turn causes the data_0_mutex
object to be destroyed. That last thread can still be running, and when the destructor for lock
is called, the mutex it references has been (or is being) destroyed.
This isn't as likely a problem in your "polling" version, as the main thread has to wait for the OS to resume it, but it could still happen.
The solution is to free the lock before setting the promise, by calling
lock.unlock();
before promise[i].set_value();
.
Upvotes: 1