Colin Godsey
Colin Godsey

Reputation: 1321

In a multithreaded process on a system with multiple (physical) CPUs, how is thread scheduling handled?

Kind of a broad question, but I'm curious about the details of thread scheduling in a single process application on a machine with multiple physical CPUs.

EDIT - wanted to clarify that below im talking about phyiscal CPUs. I've got a pretty good handle on how a process/threads work with a multicore CPU, but I'm talking multiple physical CPU dyes on the motherboard (like 2 4-core Xeons).

ANSWER - thanks to the responses from brokenfoot and nosid, I think I've got it: - Linux scheduler has different NUMA policies that affect thread scheduling in regards to their memory mutation/access patterns in regards to core/dye. - Cache coherency across dyes is possible, but slower as expected. - Best course of action- control mutability of shared memory (try to be immutable) - Use an internal (in-process) task scheduler that respects locality of threads - Use a NUMA policy that works with your in-process task scheduler

Assumptions:

So the situation:

The questions:

Upvotes: 3

Views: 1943

Answers (1)

nosid
nosid

Reputation: 50044

Is cache coherency possible between multiple CPUs? Is it practical?

It's up to the programming language, compiler and runtime environment. They take care, that your program can use several CPUs and still have consistent memory operations. For that purpose, the programming language typically define a so-called memory model.

How will linux schedule the threads between CPUs? (If possible)

Without going into details, it typically uses all CPU cores. There is no static assignment between threads and cores. That means, a thread can be running for a while on one core, and later on another core. However, the Linux kernel tries to keep threads local to their memory, because systems with several CPU sockets have a non-uniform memory architecture (NUMA).

Is there some way to pin a process to a single CPU?

Yes, look for cpuset.

And ultimately... do I do one process per CPU and pin? Or 1 per box (which would be cool, if i dont screw myself with slow cross-CPU cache misses)

If your application benefits from using shared memory, use one process per box. There is no disadvantage performance-wise.

Upvotes: 5

Related Questions