How linux process scheduler prevents starvation of a process

I have read that linux kernel contains many schedule classes each having it's own priority. To select a new process to run, the process scheduler iterates from the highest priority class to lowest priority class. If a runnable process is found in a class, the highest priority process is selected to run from that class.

Extract from Linux kernel development by Robert Love:

The main entry point into the process schedule is the function schedule() , defined in kernel/sched.c .This is the function that the rest of the kernel uses to invoke the process scheduler, deciding which process to run and then running it. schedule() is generic with respect to scheduler classes.That is, it finds the highest priority scheduler class with a runnable process and asks it what to run next. Given that, it should be no surprise that schedule() is simple.The only important part of the function—which is otherwise too uninteresting to reproduce here—is its invocation of pick_next_task() , also defined in kernel/sched.c .The pick_next_task() function goes through each scheduler class, starting with the highest priority, and selects the highest priority process in the highest priority class.

Let's imagine the following scenario. There are some processes waiting in lower priority classes and processes are being added to higher priority classes continuously. Won't the processes in lower priority classes starve?

Upvotes: 4

Answers (2)

Rachit Tayal

Reputation: 1292

Linux kernel implements Completely Fair Scheduling algorithm which is based on virtual clock.

Each scheduling entity has a sched_entity structure associated with it whose snapshot looks like

struct sched_entity {
    ...
    u64 exec_start;
    u64 sum_exec_runtime;
    u64 vruntime;
    u64 prev_sum_exec_runtime;
    ...
}

The above four attributes are used to track the runtime of a process and using these attributes along with some other methods(update_curr() where these are updated), the virtual clock is implemented. When a process is assigned to CPU, exec_start is updated to current time and the consumed CPU time is recorded in sum_exec_runtime. When process is taken off from CPU sum_exec_runtime value is preserved in prev_sum_exec_runtime. sum_exec_runtime is calculated cumulatively. (Meaning it grows monotonically).

vruntime stores the amount of time that has elapsed on virtual clock during process execution.

How vruntime is calculated?

Ignoring all the complex calculations, the core concept of how it is calculated is :-

vruntime += delta_exec_weighted;
delta_exec_weighted = delta_exec * (NICE_0_LOAD/load.weight);

Here delta_exec is the time difference between process assigned to CPU and taken off from CPU whereas load.weight is the weight of the process which depends on priority (Nice Value). Usually an increase in nice value of 1 for a process means it gets 10 percent less CPU time resulting in less weight. Process with NICE value 0, weight = 1024 Process re-Niced with value 1, weight = 1024/1.25 = 820(approx)

Points drawn from above

So vruntime increases when a process gets CPU
And vruntimeincreases slowly for higher priority processes compared with lower priority processes.

The runqueue is maintained in red-black tree and each runqueue has a min_vruntime variable associated with it that holds the smallest vruntime among all the process in the run-queue. (min_vruntime can only increase, not decrease as processes will be scheduled).

The key for the node in red black tree is process->vruntime - min_vruntime

When scheduler is invoked, the kernel basically picks up the task which has the smallest key (the leftmost node) and assigns it the CPU.

Elements with smaller key will be placed more to the left, and thus be scheduled more quickly.

When a process is running, its vruntime will steadily increase, so it will finally move rightwards in the red-black tree. Because vruntime is increase more slowly for more important processes, they will also move rightwards more slowly, so their chance to be scheduled is bigger for a less important process - just as required.
If a process sleeps, its vruntime will remain unchanged. Because the per-queue min_vruntime will increase in the meantime, the sleeper process will be placed more to the left after waking up because the key(mentioned above) got smaller.

Therefore there are no chances of starvation as a lower priority process if deprived of CPU, will have its vruntime smallest and hence key will be smallest so it moves to the left of the tree quickly and therefore scheduled.

Upvotes: 7

Tony Tannous

Reputation: 14876

It would indeed starve. There are many ways of dealing with such scenario.

Aging, the longer the process is in the system, increase its priority.
Scheduling algorithms giving every process a time-quantum to use the CPU. Time-Quantum varies, usually, interactive processes are given lower time-quantum as they spend more time doing I/O while time consuming/computational processes are given bigger time quantum. After a process runs its time quantum, it is put in an expired queue until there are no active processes in the system. Then, the expired queue becomes the active queue and vice versa. These are 2 ways in preventing starvation.

Upvotes: 1

How linux process scheduler prevents starvation of a process

Answers (2)

Related Questions