Bob
Bob

Reputation: 2303

Kubernetes can't start due to too many open files in system

I am trying create a bunch of pods, services and deployment using Kubernetes, but keep hitting the following errors when I run the kubectl describe command.

for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container bbdb58770a848733bf7130b1b230d809fcec3062b2b16748c5e4a8b12cc0533a: [8] System error: too many open files in system\n"

I have already terminated all pods and try restarting the machine, but it doesn't solve the issue. I am not an Linux expert, so I am just wondering how shall find all the open files and close them?

Upvotes: 4

Views: 14684

Answers (2)

Rotem jackoby
Rotem jackoby

Reputation: 22088

If the problem returns then you might need to change the ulimit value.

You haven't specified if you're running on a cloud provider or locally with tools like kind/minikube.

If you need to change the ulimit value on all nodes in a clusteryou can run a privilaged Daemon that will change the ulimit value:

image: busybox
command: ["sh", "-c", "ulimit -n 10000"]
securityContext:
  privileged: true

And then delete it.

If it's for specific node, you can SSH into it with:

  kubectl debug node/mynode -it --image=busybox

And then try to run the ulimit command.

If it's a Linux node and you got the permission error, try first to raise the allowed limit in the /etc/limits.conf file (or /etc/security/limits.conf depends on your Linux distribution) - add this line:

* hard nofile 10000

Then logout and login again and then you can run:

ulimit -n 10000

If for some reason you can't run the ulimit command try editing the docker configuration.

If it's just for a quick debug and you need to apply some changes to all containers in a node try to edit the /etc/docker/daemon.json configuration file:

{
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
}

For a more permenant change you can for example in EKS add this via the user data:

sudo sed -i "s|ExecStart=.*|ExecStart=/usr/bin/dockerd --default-ulimit memlock=83968000:83968000|g" /usr/lib/systemd/system/docker.service

sudo systemctl restart docker.service

Upvotes: 1

CJ Cullen
CJ Cullen

Reputation: 5642

You can confirm which process is hogging file descriptors by running:

lsof | awk '{print $2}' | sort | uniq -c | sort -n

That will give you a sorted list of open FD counts with the pid of the process. Then you can look up each process w/

ps -p <pid>

If the main hogs are docker/kubernetes, then I would recommend following along on the issue that caesarxuchao referenced.

Upvotes: 13

Related Questions