MaiaVictor
MaiaVictor

Reputation: 53027

Why are background, idle threads slowing the main thread?

I'm in a notebook with Apple M1, which has 4 cores. I have a problem which can be solved in parallel by spawning millions threads dynamically. Since pthread_create has a huge overhead, I'm avoiding it by, instead, leaving 3 threads in the background. These threads wait for tasks to arrive:

void *worker(void *arg) {
  u64 tid = (u64)arg;

  // Loops until the main thread signals work is complete
  while (!atomic_load(&done)) {

    // If there is a new task for this worker...
    if (atomic_load(&workers[tid].stat) == WORKER_NEW_TASK) {

      // Execute it
      execute_task(&workers[tid]);

    }

  }

  return 0;
}

These threads are spawned with pthread_create once:

pthread_create(&workers[tid].thread, NULL, &normal_thread, (void*)tid)

Any time I need a new task to be done, instead of calling pthread_create again, I just select an idle worker and send the task to it:

workers[tid].stat = WORKER_NEW_TASK
workers[tid].task = ...

The problem is: for some reason, leaving these 3 threads on the background makes my main thread 25% slower. Since my CPU has 4 cores, I expected these 3 threads to not affect the main thread at all.

Why are the background threads slowing down the main thread? Am I doing anything wrong? Is the while (!atomic_load(&done)) loop consuming a lot of CPU power?

Upvotes: 0

Views: 261

Answers (1)

MaiaVictor
MaiaVictor

Reputation: 53027

The issue is that I had a thread like this:

typedef struct {
  atomic_int has_work;
} Worker;

Worker workers[MAX_THREADS];

void *worker(void *arg) {
  u64 tid = (u64)arg;
  while (1) {
    if (atomic_load(workers[tid].has_work)) {
      // do stuff
    }
  }
}

(...)

Notice how I used an atomic flag, has_work, to "wake up" the worker. As pointed by @WhozCraig on his comment, it may not be a good idea to spin-look on atomic flags. This, if I understand correctly, is computation-intensive, which overwelms the CPU. His suggestion was to use mutex with conditional vars, as described on "Programming with POSIX Threads" by Butenhof, section 3.3. This stack overflow answer has a snippet showing the common usage of that pattern. The resulting code should look like this:

typedef struct {
  int             has_work;
  pthread_mutex_t has_work_mutex;
  pthread_cond_t  has_work_signal;
}

Worker workers[MAX_THREADS];

void *worker(void *arg) {
  u64 tid = (u64)arg;
  while (1) {

    pthread_mutex_lock(&workers[tid].has_work_mutex);

    while (!workers[tid].has_work) {
      pthread_cond_wait(&workers[tid].has_work_signal, &workers[tid].has_work_mutex);
    }

    // do stuff

    pthread_mutex_unlock(&workers[tid].has_work_mutex);
  }
  return 0;
}

Notice how has_work was replaced to use a normal int instead of an atomic one, with a mutex guarding it. Then, a "conditional variable" is associated with that mutex. This allows us to use pthread_cond_wait to sort-of sleep the thread until another thread signals that "has_work" might be true. This has the same effect as the version using atomics, but is less CPU-hungry and should perform better.

Upvotes: 2

Related Questions