Reputation: 991
All the pods of a node are on Evicted state due to "The node was low on resource: ephemeral-storage."
portal-59978bff4d-2qkgf 0/1 Evicted 0 14m
release-mgmt-74995bc7dd-nzlgq 0/1 Evicted 0 8m20s
service-orchestration-79f8dc7dc-kx6g4 0/1 Evicted 0 7m31s
test-mgmt-7f977567d6-zl7cc 0/1 Evicted 0 8m17s
anyone knows the quick fix of it.
Upvotes: 65
Views: 163602
Reputation: 1083
This issue happened due to of lacking of temporary storage while processing such as application process their jobs and store temporary, cache data.
To resolve this issue, you must dive into your pod, and check, when the process running which device location cost your available storage by command df -h
, and observe the available capacity size. You can create a pvc (with hostpath, or other ways) which has larger size and mount into pod's directory which store their temporary data.
Upvotes: 5
Reputation: 3840
In my case the problem was the nodes were filling up with docker images. Some of them unused and never pruned and others way too big. To confirm it, you first have to ssh to the node and check if the disk is (nearly) full. For instance:
[root@node-name ~]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 20G 15G 5.9G 71% /
It's possible to findout which image specifically occupies to most space and I recommend to do so. Check this excellent resource to see how to: https://rharshad.com/eks-troubleshooting-disk-pressure/
Knowing which image takes the most space and investigating its file system to know why can be useful to optimize image size, but that's a different topic.
If you can't add more storage to the node it's possible to clean it up with docker prune. But before we need to make sure no containers are running, so let’s drain the node first:
kubectl drain node-name
Note that the node will be cordoned after it’s drained, this means no containers will be scheduled to it. Back inside the node let’s prune the unused docker resources:
[root@node-name ~]# docker system prune --all
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated to them
- all build cache
Are you sure you want to continue? [y/N] y
Deleted Containers:
8333683571a2ceff47bf08cc254f8fa3809acacc7fb981be3c1c274e9465dd68
28bdc62425707127ac977d20fd3dc85374ffc54ccccf2b2f2098d9af9ca3c898
7315014bfd9207c5a1b8e76ef0f1567bb5e221de6fe0304f4728218abd7e1f3f
b0f5ecb854a9f4b41610d7ec5b556447600f57529e68ae2093d1d40df02ff214
9e24227321d5e151bc665c55bcd474c9d586857cbac3cad744aad2dc11729e5e
63ab1bf7ded78d4b77db22f9c1aaac6a55247c71ca55b51caa8492f2b16c4d69
...
Total reclaimed space: 4.529GB
Then check the storage space again:
[root@node-name ~]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 20G 8.9G 12G 45% /
Now let’s put the node back to a ready state using the kubectl command from the host:
rancher kubectl uncordon node-name
Upvotes: 7
Reputation: 2796
My problem was that my pod was writing to a folder that was not defined in the volumeMounts of the deployments.
volumeMounts:
- name: my-data-volume
mountPath: "/the/path/thatImounted"
my pod wrote to a different path than "/the/path/thatImounted"
The solution in this case is to to either add the path that the pod writes to to as addittional mountPath or to fix the the wrong mountPath
Upvotes: 1
Reputation: 2191
If you don't set limits.ephemeral-storage
, requests.ephemeral-storage
, by default pods have permission to use all node's storage space.
So, you can set limits.ephemeral-storage
, requests.ephemeral-storage
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "4Gi"
Or, configure the Docker logging driver to limit the amount of stored logs (in the file /etc/docker/daemon.json
, by default this file doesn't exist, you must create it):
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "2"
}
}
Upvotes: 3
Reputation: 7849
Please consider following factors:
Upvotes: -5
Reputation: 21
You can increase the size of the EBS volume which is attached and restart the EC2 instance to get that effect.
Upvotes: -3
Reputation: 44677
Pods that use emptyDir volumes without storage quotas will fill up this storage, where the following error is present:
eviction manager: attempting to reclaim ephemeral-storage
Set a quota limits.ephemeral-storage, requests.ephemeral-storage
to limit this, as otherwise any container can write any amount of storage to its node filesystem.
A sample resource quota definition
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
pods: "4"
requests.cpu: "1"
requests.memory: 1Gi
requests.ephemeral-storage: 2Gi
limits.cpu: "2"
limits.memory: 2Gi
limits.ephemeral-storage: 4Gi
Another reason for this issue can be log files eating disk space. Check this question
Upvotes: 43