Reputation: 12136
I have a simple thread pool that needs to be able to wait for its workers to consume all the tasks in the queue without using standard thread::join()
, as I don't want to kill the threads themselves but just wait for them to complete all the tasks while staying in their execution loops. I thought of accomplishing this using condition_variable
s, like so:
ThreadPool.cpp
void ThreadPool::addTask(ThreadTaskFunction task, void *arg) {
std::lock_guard<std::mutex> t_lock(task_lock);
tasks.push_back(ThreadTask(task, arg));
// For every tasks that is added, I increment a counter
assignedTasks++;
}
void ThreadPool::join() {
std::unique_lock<std::mutex> lock(join_lock);
joinCondition.wait(lock, [this](){
return assignedTasks == 0;
});
}
Thread.cpp Note that this is a friend class to ThreadPool
void Thread::threadLoop() {
while (!shouldDie) {
ThreadTask task;
if (pool.hasTasks()) {
{
std::lock_guard<std::mutex> lock(pool.task_lock);
if (!pool.hasTasks()) continue;
task = pool.getTask();
}
task.function(task.argument);
this->completeTask();
}
}
this->isThreadFinished = true;
}
void Thread::completeTask() {
std::lock_guard<std::mutex> guard(pool.task_lock);
// When a task is completed, I decrement the counter and notify the main thread that a task has completed.
pool.assignedTasks--;
pool.joinCondition.notify_one();
}
I'm using this thing for a physics simulation, and this stuff happens on every step of said simulation (around once every 16ms). What happens is that after a few hundred steps, everything stops because somehow a notify signal is sent before the main thread enters the wait status, probably because it's checking the condition. If I debug, I can see that I have one task left on the counter, no tasks running on the threads, no tasks in the queue and nothing really happening anymore. Sometimes, even weirder, everything unlocks as soon as I pause execution with a breakpoint and restart it. I've tried putting more locks in place to no avail. Is there something obvious that I'm missing?
UPDATE It appears that the problem is solved by setting the assignedTasks
variable as volatile
. It's getting modified so often that sometimes the value in the registers is not updated and everything stops. Never had to do this before. Huh. :)
Upvotes: 1
Views: 235
Reputation: 42574
You are using two different mutexes (join_lock
, pool.task_lock
) to "protect" assignedTasks
at different times - this is a plain old-fashioned race condition.
Upvotes: 1