In a multithreaded process on a system with multiple (physical) CPUs, how is thread scheduling handled?

Question

Kind of a broad question, but I'm curious about the details of thread scheduling in a single process application on a machine with multiple physical CPUs.

EDIT - wanted to clarify that below im talking about phyiscal CPUs. I've got a pretty good handle on how a process/threads work with a multicore CPU, but I'm talking multiple physical CPU dyes on the motherboard (like 2 4-core Xeons).

ANSWER - thanks to the responses from brokenfoot and nosid, I think I've got it: - Linux scheduler has different NUMA policies that affect thread scheduling in regards to their memory mutation/access patterns in regards to core/dye. - Cache coherency across dyes is possible, but slower as expected. - Best course of action- control mutability of shared memory (try to be immutable) - Use an internal (in-process) task scheduler that respects locality of threads - Use a NUMA policy that works with your in-process task scheduler

Assumptions:

Cache coherency is the magic that allows multiple cores to operate on shared memory. (confirmed)
As far as I know, cache coherency is possible over multiple CPUs, but at reduced performance (Linux 3+, system has multiple modern multi-core Xeons CPUs). (confirmed)

So the situation:

I have a multi-threaded single process service that does... stuff, in parallel. It can effectively utilize multiple cores and divides work up in a way that generally avoids cpu-core cache misses and coherency abuse. Executor has relative thread-affinity for tasks.
The service threads can utilize shared data (mostly immutable) in the process.
The service architecture is done in a way that running multiple processes on the same box is possible, but is advantageous to have only 1 process per box (shared cache, resources, etc).

The questions:

Is cache coherency possible between multiple CPUs? Is it practical? (It is, at reduced performance)
How will linux schedule the threads between CPUs? (If possible)
Is there some way to pin a process to a single CPU? (confirmed)
And ultimately... do I do one process per CPU and pin? Or 1 per box (which would be cool, if i dont screw myself with slow cross-CPU cache misses) (starting to sound like 1 process is good, as long as my parallel tasks have affinity to a certain thread and mostly immutable data)

In a multithreaded process on a system with multiple (physical) CPUs, how is thread scheduling handled?

Answers (1)

Related Questions