Linux Kernel CFS CPU Usage explained

Question

I'm trying to understand better how Linux's CFS (Completely Fair Scheduler) works behind the scenes to make some improvements on the Kubernetes side.

Well, let's imagine I have a Processor with 1 core only. It means I can execute 1 task at a time, no matter what, that's how a processor works. On Linux Kernel (>=2.6.23) nowadays we have a tool called CFS which will try to be fair giving all processes the same amount of CPU usage.

Then for the same 1-core processor, I have 2 Processes, in that case, CFS will try to set 50% of this core for each process (1/2=0.5), I know it's more complex than that, we have priorities and categories that will define the virtual runtime so the CFS can pop up the correct one from the heap, in that case the least virtual runtime.

Now I know how CFS chooses the correct process to run (based on virtual runtime) and dispatches it to the processor core.

So, the next part I will explain isn't clear enough to me, so need your help folks to clarify my mind. Here is where things got confusing to me.

Let's say I have the same 2 processes (P1 and P2) and a 1-core processor. P1 needs 50ms to finish its job, and P2 needs 100ms. Ignoring CFS and just sending the P1 straight to the processor core will block P2 per 50ms, which means: P2 will take 150ms = 50ms (blocked by P1) + 100ms (CPU burst time). Like this diagram:

When CFS sets sched_latency_ns=10000000 (10ms), means each process can't take more than 10ms per execution. So, look at my diagram:

In that case, P1 will take a bit more than 100ms to finish, because we have some blocked time by P2, but on the other hand, P2 will be more efficient than waiting 100ms for p1 to release CPU, it's more fair for sure.

Now when Kubernetes comes into play, I can use a different unit, 100 millicores, and things get confused again because CPU is measured in time. Here is what I understood: 100milli = 100/1000 = 0.1, so if my CFS on Linux kernel is set to sched_latency_ns=10000000 (10ms), it means for 100milli we gonna have 1ms of CPU usage at a time (0.1*10ms=1ms). So, using cgroup limited to 100milli means that my task will take only 1ms per time no matter if the sched_latency_ns is greater than that.

Sorry for the long text, but it's not an easy thing to explain so tried to be very clear here. Thx anyway.

Linux Kernel CFS CPU Usage explained

Answers (1)

Related Questions