Ensuring availability in Kubernetes with high-variance memory / CPU load?

Question

Problem: the code we're running on Kubernetes Pods have a very high variance across it's runtime; specifically, it has occasional CPU & Memory spikes when certain conditions are triggered. These triggers involve user queries with hard realtime requirements (system has to respond within <5 seconds).

Under conditions where the node serving the spiking pod doesn't have enough CPU/RAM, Kubernetes responds to these excessive requests by killing the pod altogether; which results in no output across any time whatsoever.

In what way can we ensure, that these spikes are being taken into account when pods are allocated; and more critically, that no pod shutdown happens for these reasons?

Thanks!

Here_2_learn · Accepted Answer

High availability of pods with load can be achieved in two ways:

Configuring More CPU/Memory

As the applications requires more CPU/memory during the peak times configure in such a way that allocated resources for the POD will take care of extra load. Configure the POD something like this:

resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

You can increase the limits based on the usage. But this way of doing can cause two issues

1) Underutilized resources

As the resources are allocated in large number, these may go wasted unless there is a spike in the traffic.

2) Deployment failure

POD deployment may fail because of not having enough resources in the kubernetes node to cater the request.

For more info : https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

> Autoscaling

Ideal way of doing it is to autoscale the POD based on the traffic.

kubectl autoscale deployment  --cpu-percent=50 --min=1 --max=10

Configure the cpu-percent based on the requirement, else 80% by default. Min and max are the number of PODS which can be configured accordingly.

So each time a POD hits the CPU percent with 50% a new pod will be launched and continues till it launches a max of 10 PODS and same applicable for vice-versa scenario.

For more info: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

Ensuring availability in Kubernetes with high-variance memory / CPU load?

Answers (2)

Related Questions