Reputation: 38651
Today my kubernetes cluster(v1.15.2) node disk full and cause pods give this tips:
Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.261578065: no space left on device, which is unexpected.
MountVolume.SetUp failed for volume "default-token-xnrwt" : mkdir /opt/k8s/k8s/kubelet/pods/67eaa71b-adf4-4365-a1c7-42045d5e9426: no space left on device
I login into server and find the disk usage is 100%, so I remove some log file and release 10GB + disk space, but now it seems pod is not recovery automaticlly and still have this error tips:
what should I do to fix this problem? I am try to restart all pods, and all pods works fine. But finally I found the error tips message still give me tips no space left and did not disappear automaticlly. I check the node status and find the node has no disk pressure. How to make the error tips disappear?
Upvotes: 15
Views: 40613
Reputation: 1833
In my case, the docker image was huge(>7GB) and my EKS nodes had only 20GB of storage. I was facing this issue:
Failed to pull image "XXXX.dkr.ecr.us-east-1.amazonaws.com/image-name:xxxxxxxx": rpc error: code = Unknown desc = failed to pull and unpack image "XXXXX.dkr.ecr.us-east-1.amazonaws.com/content-intel-xxxxxx:xxxxxxxx": failed to copy: write /var/lib/containerd/io.containerd.content.v1.content/ingest/xxxxxxxxxxxxxx/data: no space left on device
To fix this I created a new node group with increased disk storage size(50GB).
Upvotes: 0
Reputation: 713
Other possibility is incorrect unit values for resource requests/limits (ex, using mi
instead of Mi
).
For example:
apiVersion: v1
kind: Pod
spec:
containers:
- name: {container_name}
resources:
limits:
memory: "512mi" # incorrect; should be "512Mi"
cpu: "200m"
Upvotes: 45
Reputation: 461
The other answers are correct but fail to explain why it is happening. Kube uses tempfs/memory-based/which-is-resource.limits.memory-based mounts for /tmp and /run.
tmpfs are constrained by available memory resources
Leaving the suffix off the memory limit means you would be trying to start a pod with (number you thought referred to Mi or Gi) bytes of memory assigned. Each filesystem inode takes up 4k.
Those inodes must persist for the life of the tmpfs volume. kubernetes does a horrible job at bubbling those sorts of errors. They are real and describe a real state but kube events make it look like your node is blowing up or the thing in the pod is doing something horrible.
Upvotes: 0
Reputation: 14082
Posting this Community Wiki
as a solution was mentioned in the comment section.
Errors like no space left on device, which is unexpected.
and no space left on device
occurs when your application is using 100% of available space. You can check it using the command $ df -h
.
Solution
To resolve this kind of issue, you have to "make some space" in volume
. You can do it by manual removal of files (OP did it in this scenario).
Once you make some space you should restart kubelet
using $ systemctl restart kubelet
.
Above steps resolved the OPs issue.
In addition, in some specific scenarios you might also restart docker service using $ service docker restart
or specific resource.
Upvotes: 4