Reputation: 983
My team and I are trying to deploy very compute heavy workloads on GCP serverless infrastructure. Since Cloud Run has very narrow resource limits (4 vCPUs & 8GB memory), we are testing GKE with Autopilot next.
With a default Autopilot cluster, I managed to provision a single deployment & container with up to 8 vCPUs, but no more.
My question now is whether there is a way to deploy a deployment & container with resources.request.requests.cpu > 8 and if so, how.
So far I've tried:
Here my deployment.yaml:
---
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "backend-flask"
namespace: "default"
labels:
app: "backend-flask"
spec:
replicas: 1
selector:
matchLabels:
app: "backend-flask"
template:
metadata:
labels:
app: "backend-flask"
spec:
containers:
- name: "backend-flask1"
image: "{...}backend-flask:latest"
resources:
requests:
memory: "6Gi"
cpu: "8"
limits:
memory: "32Gi"
cpu: "32"
# nodeSelector:
# beta.kubernetes.io/instance-type: e2-highcpu-32
---
# apiVersion: autoscaling.gke.io/v1beta1
# kind: MultidimPodAutoscaler
# metadata:
# name: backend-flask-autoscaler
# spec:
# scaleTargetRef:
# apiVersion: apps/v1
# kind: Deployment
# name: backend-flask
# goals:
# metrics:
# - type: Resource
# resource:
# # Define the target CPU utilization request here
# name: cpu
# target:
# type: Utilization
# averageUtilization: 80
# constraints:
# global:
# minReplicas: 1
# maxReplicas: 2
# containerControlledResources: [ memory ]
# container:
# - name: '*'
# # Define boundaries for the memory request here
# requests:
# minAllowed:
# memory: 4Gi
# cpu: 4
# maxAllowed:
# memory: 32Gi
# cpu: 32
# policy:
# updateMode: Auto
# ---
apiVersion: "autoscaling/v2beta1"
kind: "HorizontalPodAutoscaler"
metadata:
name: "backend-flask-horizontal-autoscaler"
namespace: "default"
labels:
app: "backend-flask"
spec:
scaleTargetRef:
kind: "Deployment"
name: "backend-flask"
apiVersion: "apps/v1"
minReplicas: 1
maxReplicas: 1
metrics:
- type: "Resource"
resource:
name: "cpu"
targetAverageUtilization: 80
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: backend-flask-horizontal-autoscaler
namespace: "default"
labels:
app: "backend-flask"
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: backend-flask
updatePolicy:
updateMode: "Auto"
---
apiVersion: "v1"
kind: "Service"
metadata:
name: "backend-flask-service"
namespace: "default"
labels:
app: "backend-flask"
spec:
ports:
- protocol: "TCP"
port: 5000
targetPort: 5000
selector:
app: "backend-flask"
type: "LoadBalancer"
loadBalancerIP: ""
Upvotes: 0
Views: 532
Reputation: 983
Turns out it really was a quota issue. For some reason the quotas constantly showed more instances that I was actually using at the time.
Increasing quotas only took effect after deleting & recreating the cluster.
Finally, my onw autoscalers messed with my deployment as I was using the specified resources in between requests.
Thank you for your answers @GariSingh. I was also able to deploy up to 24 CPUs once I removed the autoscalers and increased quotas.
Upvotes: 0