Gian Marco Toso
Gian Marco Toso

Reputation: 12136

C++11 condition_variable synchronisation issue

I have a simple thread pool that needs to be able to wait for its workers to consume all the tasks in the queue without using standard thread::join(), as I don't want to kill the threads themselves but just wait for them to complete all the tasks while staying in their execution loops. I thought of accomplishing this using condition_variables, like so:

ThreadPool.cpp

void ThreadPool::addTask(ThreadTaskFunction task, void *arg) {
    std::lock_guard<std::mutex> t_lock(task_lock);
    tasks.push_back(ThreadTask(task, arg));

    // For every tasks that is added, I increment a counter
    assignedTasks++;
}

void ThreadPool::join() {
    std::unique_lock<std::mutex> lock(join_lock);
    joinCondition.wait(lock, [this](){
        return assignedTasks == 0;
    });
}

Thread.cpp Note that this is a friend class to ThreadPool

void Thread::threadLoop() {
    while (!shouldDie) {
        ThreadTask task;

        if (pool.hasTasks()) {
            {
                std::lock_guard<std::mutex> lock(pool.task_lock);

                if (!pool.hasTasks()) continue;

                task = pool.getTask();
            }

            task.function(task.argument);
            this->completeTask();
        }
    }

    this->isThreadFinished = true;
}

void Thread::completeTask() {
    std::lock_guard<std::mutex> guard(pool.task_lock);

    // When a task is completed, I decrement the counter and notify the main thread that a task has completed.
    pool.assignedTasks--;    
    pool.joinCondition.notify_one();
}

I'm using this thing for a physics simulation, and this stuff happens on every step of said simulation (around once every 16ms). What happens is that after a few hundred steps, everything stops because somehow a notify signal is sent before the main thread enters the wait status, probably because it's checking the condition. If I debug, I can see that I have one task left on the counter, no tasks running on the threads, no tasks in the queue and nothing really happening anymore. Sometimes, even weirder, everything unlocks as soon as I pause execution with a breakpoint and restart it. I've tried putting more locks in place to no avail. Is there something obvious that I'm missing?

UPDATE It appears that the problem is solved by setting the assignedTasks variable as volatile. It's getting modified so often that sometimes the value in the registers is not updated and everything stops. Never had to do this before. Huh. :)

Upvotes: 1

Views: 235

Answers (1)

Casey
Casey

Reputation: 42574

You are using two different mutexes (join_lock, pool.task_lock) to "protect" assignedTasks at different times - this is a plain old-fashioned race condition.

Upvotes: 1

Related Questions