Where to configure disk pressure in Kubernetes?

I have a master node that has disk pressure and is spamming the log full with endless messages like these:

Mar 18 22:53:04 kubelet[7521]: W0318 22:53:04.413211 7521 eviction_manager.go:344] eviction manager: attempting to reclaim ephemeral-storage

Mar 18 22:53:04 kubelet[7521]: I0318 22:53:04.413235 7521 container_gc.go:85] attempting to delete unused containers

......................

Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429446 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod kube-controller-manager_kube-system(5308d5632ec7d3e588c56d9f0bca17c8) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429458 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod kube-apiserver_kube-system(9fdc5b37e61264bdf7e38864e765849a) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429464 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod kube-scheduler_kube-system(90280dfce8bf44f46a3e41b6c4a9f551) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429472 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod coredns-74ff55c5b-th722_kube-system(33744a13-8f71-4e36-8cfb-5955c5348a14) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429478 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod coredns-74ff55c5b-d45hd_kube-system(65a5684e-5013-4683-aa38-820114260d63) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429487 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod weave-net-wjs78_kube-system(f0f9a4e5-98a4-4df4-ac28-6bc1202ec06d) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429493 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod kube-proxy-8dvws_kube-system(c55198f4-38bc-4adf-8bd8-4a2ec2d8a46d) Mar 18 22:53:04 kubelet[7521]: E0318 22:53:04.429498 7521 eviction_manager.go:574] eviction manager: cannot evict a critical pod etcd_kube-system(e3f86cf1b5559dfe46a5167a548f8a4d) Mar 18 22:53:04 kubelet[7521]: I0318 22:53:04.429502 7521 eviction_manager.go:396] eviction manager: unable to evict any pods from the node

..............

This has been going on for months. I know that disk pressure is probably set the default value, but WHERE is it configured in the first place?

It is probably this setting that can be set:

imagefs.available imagefs.available := node.stats.runtime.imagefs.available

(according to the link above)

But again, where? In etcd? How can I set this for all nodes to a default?

It is true that there is less space available than the setting is set to, but this is the controlplane (there are no other pods on it) and not a productive system, it is for testing only and I can't see anything in the logs because kubernetes spams it full of garbage. Garbage because these messages make absolutely not sense: These pods are not supposed to be evicted ever, they are essential and they should not even be tried to evict.

My questions:

Also, what about the rate limiter?

Of stopping after it failing 10 times?

Crashloopbackoff?

Also, I can't see the currently set values.

Upvotes: 2

Answers (2)

nasty

Reputation: 43

The kubelet is the kubernetes kernel and runs on each node. Similar to a linux kernel, it manages critical system functions like memory and disk allocation, thus disk pressure is configured here.

Aside from the obvious answer of delete some files, you can control the disk pressure thresholds in the following ways:

Quick and Dirty

This is the old way by command line:

kubelet --eviction-hard 'imagefs.available<5%,memory.available<10Mi,nodefs.available<3%'

Which returns a deprecation warning:

Flag --eviction-hard has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag

The right way

So we can check the config flag being passed at kubelet startup on most linux systems with:

sudo systemctl status kubelet

Find the config file by the '--config' flag on the startup command under 'CGroup':

● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2024-08-12 19:33:13 CDT; 36min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 44420 (kubelet)
    Tasks: 20
   Memory: 54.8M
   CGroup: /system.slice/kubelet.service
           └─44420 /usr/local/bin/kubelet --v=2 --node-ip=0.0.0.0 --hostname-override=myhostname.com --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet.conf --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --runtime-cgroups=/system.slice/containerd.service --root-dir=/opt/services/k8s/kubelet

Append the required changes to the end of the file with vim:

vi /etc/kubernetes/kubelet-config.yaml

...
...
eventRecordQPS: 5
shutdownGracePeriod: 60s
shutdownGracePeriodCriticalPods: 20s
evictionHard:
    memory.available:  "100Mi"
    nodefs.available:  "3%"
    nodefs.inodesFree: "5%"
    imagefs.available: "5%"

Restarting the kubelet service to apply the changes on this node only:

sudo systemctl restart kubelet

Your other questions are vague:

Also, what about the rate limiter? ..?
Of stopping after it failing 10 times? Kubernetes will stop trying to restart the pod for a period after a set number of failures (usually 3)
Crashloopbackoff? This is the status given by Kubernetes when it has tried to start a pod and failed past the threshold. It resumes again after a timeout as it is designed to be self-healing
Also, I can't see the currently set values. You're free to make a complaint or feature request on the Kubernetes github repo: https://github.com/kubernetes/kubernetes/issues

Upvotes: 1

coderanger

Reputation: 54211

There's three ways to set Kubelet options. First is command line options like --eviction-hard. Next is a config file. And more recent is dynamic configuration.

Of course the better answer here is to free up some disk space.

Upvotes: 2

Where to configure disk pressure in Kubernetes?

Answers (2)

Related Questions