ballsmahoney
ballsmahoney

Reputation: 151

Does the Linux scheduler need to execute on every core in a multicore?

Is it possible for a kernel scheduler thread executing on one core to handle the scheduling of threads/processes for a different remote core, or is it necessary for every core to have their own kernel scheduler thread to handle which thread/process execute on it at any given time? For reference, I am curious about how this would work for an OS implementation similar to [1], which proposes using some cores in a multicore for applications and others for kernel tasks.

[1] Q. Yuan, J. Zhao, M. Chen and N. Sun, "GenerOS: An asymmetric operating system kernel for multi-core systems," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010, pp. 1-10, doi: 10.1109/IPDPS.2010.5470363.

Upvotes: 1

Views: 406

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 363852

Normally scheduling is a distributed algorithm, with kernel code on each core deciding what to run next. That's generally efficient because it can just context-switch to the new task itself without sending an IPI to another core that might in the meantime have started doing something else. Also, some kernel data structures for the task we're switching to will already hot in L1d cache on this core because the scheduler code was just looking at them.

But that's not the only possible design (as discussed in comments), and there are cases where mainstream OSes do tell another core what it should run next:


Most ISAs allow one core to send an inter-processor interrupt (IPI) to another core. With some associated data, you can pass messages. So instead of just asking another core to run the scheduler, the message could be to context-switch to a given task-id.

Linux already uses IPIs for things like TLB shootdowns where some code needs to run on certain other cores (that were recently running threads of a process). And maybe for stuff like run_on() which RCU uses to make sure the deallocation task has run on each core in the system so there definitely aren't still any stalled readers in the middle of accessing some old copies of data structures it wants to free.

Also IPIs to trigger another core to at least run its scheduler are a thing when there's too much work on one core. e.g. after a fork() or clone() if there are sleeping core, it the core handling the system call probably picks a sleeping core and sends it an IPI now, instead of waiting for its timer interrupt to wake it up and have it notice the waiting tasks.

Upvotes: 1

bazza
bazza

Reputation: 8394

Where it gets interesting I think is interrupts. These are used for all sorts of things, but one is the scheduler tick. In a preemptively scheduled OS, a timer interrupt is used to interrupt the flow of a program, pass control to the OS so that it can decide if there's something more important for the core to do instead, and (decision made) control is then passed back to whatever program is chosen to run.

If that timer interrupt is routed to an ISR on a scheduling core, it does not interrupt the flow of a program on another core. If a context switch between programs is needed, the program core has got to somehow be stopped, a process context saved, and another process context restored. Interrupts are the typical way in which program flow is stopped, so I'd imagine that to achieve that another interrupt would have to be raised on the program core by the scheduler core to allow the context switch to be done.

So, possibly, the scheduling decision need not be done on the core(s) concerned, but the act of context switching feels like it has to be done where the program was running.

The architectural decision to be made then in OS design is whether having two ISRs to do scheduling calculations and context switching separately is better or worse than one that does everything in-situ, on the same core as the program.

Upvotes: 0

Related Questions