Reputation: 123
I am trying to set up a Job on an autopilot GKE cluster.
The job is used to restore database backups, so it needs to be able to download and decompress very large files (around 50 - 100Gi).
However, autopilot pods have a limit of 10Gi so I followed this guide to be able to use an ephemeral volume instead:
https://cloud.google.com/kubernetes-engine/docs/how-to/generic-ephemeral-volumes
I have confirmed that the volume is indeed available to the pod by using the command:
kubectl exec -it deploy/ephemeral-deployment -- bash
So the volume is being created, mounted, and available to the Job, giving it the 100Gi of space it needs. Despite this, the Job keeps failing and I get the error message:
Pod ephemeral local storage usage exceeds the total limit of containers 1Gi.
I did some research and found that it is due to the resource limits set in the YAML file:
resources:
limits:
cpu: "5"
ephemeral-storage: 1Gi <------
memory: 6Gi
requests:
cpu: "5"
ephemeral-storage: 1Gi
memory: 6Gi
The problem is, I can't remove the limits. If I create the job without them in the YAML, it automatically puts them in for me. If I increase them, it resets them back to the 10GB limit.
Either way, it's making it so that I can't use the 100GB I have set up on the ephemeral volume. Almost like it's fighting itself.
Is there any way around this?
Upvotes: 0
Views: 442
Reputation: 550
It's a new capabilities on GKE autopilot cluster, you can read it on this article.
Things to consider to be able to use higher ephemeral storage on GKE autopilot:
upgrade to version 1.28.6-gke.1095000 or later
you need to use performance compute class, C3, C3D, etc machine family
use sample YAML below as reference:
apiVersion: v1
kind: Pod
metadata:
name: performance-pod
spec:
nodeSelector:
cloud.google.com/compute-class: Performance
cloud.google.com/machine-family: c3d
containers:
- name: my-container
image: "k8s.gcr.io/pause"
resources:
requests:
cpu: 4
memory: "16Gi"
ephemeral-storage: 100Gi
Upvotes: 0