Dolphin
Dolphin

Reputation: 38651

no space left on device, which is unexpected. MountVolume.SetUp failed for volume in kubernetes cluster

Today my kubernetes cluster(v1.15.2) node disk full and cause pods give this tips:

Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.261578065: no space left on device, which is unexpected.
MountVolume.SetUp failed for volume "default-token-xnrwt" : mkdir /opt/k8s/k8s/kubelet/pods/67eaa71b-adf4-4365-a1c7-42045d5e9426: no space left on device

I login into server and find the disk usage is 100%, so I remove some log file and release 10GB + disk space, but now it seems pod is not recovery automaticlly and still have this error tips:

enter image description here

what should I do to fix this problem? I am try to restart all pods, and all pods works fine. But finally I found the error tips message still give me tips no space left and did not disappear automaticlly. I check the node status and find the node has no disk pressure. How to make the error tips disappear?

Upvotes: 15

Views: 40613

Answers (4)

Mithun Biswas
Mithun Biswas

Reputation: 1833

In my case, the docker image was huge(>7GB) and my EKS nodes had only 20GB of storage. I was facing this issue:

Failed to pull image "XXXX.dkr.ecr.us-east-1.amazonaws.com/image-name:xxxxxxxx": rpc error: code = Unknown desc = failed to pull and unpack image "XXXXX.dkr.ecr.us-east-1.amazonaws.com/content-intel-xxxxxx:xxxxxxxx": failed to copy: write /var/lib/containerd/io.containerd.content.v1.content/ingest/xxxxxxxxxxxxxx/data: no space left on device

To fix this I created a new node group with increased disk storage size(50GB).

Upvotes: 0

v1d3rm3
v1d3rm3

Reputation: 713

Other possibility is incorrect unit values for resource requests/limits (ex, using mi instead of Mi).

For example:

apiVersion: v1
kind: Pod
spec:
  containers:
    - name: {container_name}
      resources:
        limits:
          memory: "512mi" # incorrect; should be "512Mi"
          cpu: "200m"

Upvotes: 45

gladiatr72
gladiatr72

Reputation: 461

The other answers are correct but fail to explain why it is happening. Kube uses tempfs/memory-based/which-is-resource.limits.memory-based mounts for /tmp and /run.

tmpfs are constrained by available memory resources

Leaving the suffix off the memory limit means you would be trying to start a pod with (number you thought referred to Mi or Gi) bytes of memory assigned. Each filesystem inode takes up 4k.

Those inodes must persist for the life of the tmpfs volume. kubernetes does a horrible job at bubbling those sorts of errors. They are real and describe a real state but kube events make it look like your node is blowing up or the thing in the pod is doing something horrible.

Upvotes: 0

PjoterS
PjoterS

Reputation: 14082

Posting this Community Wiki as a solution was mentioned in the comment section.

Errors like no space left on device, which is unexpected. and no space left on device occurs when your application is using 100% of available space. You can check it using the command $ df -h.

Solution

To resolve this kind of issue, you have to "make some space" in volume. You can do it by manual removal of files (OP did it in this scenario).
Once you make some space you should restart kubelet using $ systemctl restart kubelet.

Above steps resolved the OPs issue.

In addition, in some specific scenarios you might also restart docker service using $ service docker restart or specific resource.

Upvotes: 4

Related Questions