Reputation: 2040
I have a service deployed in Kubernetes and I am trying to optimize the requested cpu resources.
For now, I have deployed 10 instances and set spec.containers[].resources.limits.cpu
to 0.1
, based on the "average" use. However, it became obvious that this average is rather useless in practice because under constant load, the load increases significantly (to 0.3-0.4 as far as I can tell).
What happens consequently, when multiple instances are deployed on the same node, is that this node is heavily overloaded; pods are no longer responsive, are killed and restarted etc.
What is the best practice to find a good value? My current best guess is to increase the requested cpu to 0.3 or 0.4; I'm looking at Grafana visualizations and see that the pods on the heavily loaded node(s) converge there under continuous load. However, how can I know if they would use more load if they could before becoming unresponsive as the node is overloaded?
I'm actually trying to understand how to approach this in general. I would expect an "ideal" service (presuming it is CPU-focused) to use close to 0.0
when there is no load, and close to 1.0 when requests are constantly coming in. With that assumption, should I set the cpu.requests
to 1.0
, taking a perspective where actual constant usage is assumed?
I have read some Kubernetes best practice guides, but none of them seem to address how to set the actual value for cpu requests in practice in more depth than "find an average".
Upvotes: 0
Views: 695
Reputation: 5457
In a nutshell: Main goal is to understand how much traffic a pod can handle and how much resource it consumes to do so.
CPU limits are hard to understand and can be harmful, you might want to avoid them, see static policy documentation and relevant github issue.
To dimension your CPU requests you will want to understand first how much a pod can consume during high load. In order to do this you can :
You will eventually end up with a ratio clients-or-requests-per-sec/cpu-consumed
. You can suppose the relation is linear (this might not be true if your workload complexity is O(n^2) with n the number of clients connected, but this is not the nominal case).
You can then choose the pod resource requests based on the ratio you measured. For example if you consume 1.2 cpu for 1000 requests per second you know that you can give each pod 1 cpu and it will handle up to 800 requests per second.
Once you know how much a pod can consume under its maximal load, you can start setting up cpu-based autoscaling, 70% is a good first target that can be refined if you encounter issues like latency or pods not autoscaling fast enough. This will avoid your nodes to run out of cpu if the load increases.
There are a few gotchas, for example single-threaded applications are not able to consume more than a cpu. Thus if you give it 1.5 cpu it will run out of cpu but you won't be able to visualize it from metrics as you'll believe it still can consume 0.5 cpu.
Upvotes: 2
Reputation: 54211
Basically come up with a number that is your lower acceptable bound for how much the process runs. Setting a request of 100m
means that you are okay with a lower limit of your process running 0.1 seconds for every 1 second of wall time (roughly). Normally that should be some kind of average utilization, usually something like a P99 or P95 value over several days or weeks. Personally I usually look at a chart of P99, P80, and P50 (median) over 30 days and use that to decide on a value.
Limits are a different beast, they are setting your CPU timeslice quota. This subsystem in Linux has some persistent bugs so unless you've specifically vetted your kernel as correct, I don't recommend using it for anything but the most hostile of programs.
Upvotes: 1