masus04
masus04

Reputation: 983

Vertical / Horizontal Scaling for GKE autopilot

My team and I are trying to deploy very compute heavy workloads on GCP serverless infrastructure. Since Cloud Run has very narrow resource limits (4 vCPUs & 8GB memory), we are testing GKE with Autopilot next.

With a default Autopilot cluster, I managed to provision a single deployment & container with up to 8 vCPUs, but no more.

My question now is whether there is a way to deploy a deployment & container with resources.request.requests.cpu > 8 and if so, how.

So far I've tried:

Here my deployment.yaml:

---
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
  name: "backend-flask"
  namespace: "default"
  labels:
    app: "backend-flask"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: "backend-flask"
  template:
    metadata:
      labels:
        app: "backend-flask"
    spec:
      containers:
      - name: "backend-flask1"
        image: "{...}backend-flask:latest"
        resources:
          requests:
            memory: "6Gi"
            cpu: "8"
          limits:
            memory: "32Gi"
            cpu: "32"
      # nodeSelector:
      #   beta.kubernetes.io/instance-type: e2-highcpu-32
---
# apiVersion: autoscaling.gke.io/v1beta1
# kind: MultidimPodAutoscaler
# metadata:
#   name: backend-flask-autoscaler
# spec:
#   scaleTargetRef:
#     apiVersion: apps/v1
#     kind: Deployment
#     name: backend-flask
#   goals:
#     metrics:
#     - type: Resource
#       resource:
#       # Define the target CPU utilization request here
#         name: cpu
#         target:
#           type: Utilization
#           averageUtilization: 80
#   constraints:
#     global:
#       minReplicas: 1
#       maxReplicas: 2
#     containerControlledResources: [ memory ]
#     container:
#     - name: '*'
#     # Define boundaries for the memory request here
#       requests:
#         minAllowed:
#           memory: 4Gi
#           cpu: 4
#         maxAllowed:
#           memory: 32Gi
#           cpu: 32
#   policy:
#     updateMode: Auto
# ---
apiVersion: "autoscaling/v2beta1"
kind: "HorizontalPodAutoscaler"
metadata:
  name: "backend-flask-horizontal-autoscaler"
  namespace: "default"
  labels:
    app: "backend-flask"
spec:
  scaleTargetRef:
    kind: "Deployment"
    name: "backend-flask"
    apiVersion: "apps/v1"
  minReplicas: 1
  maxReplicas: 1
  metrics:
  - type: "Resource"
    resource:
      name: "cpu"
      targetAverageUtilization: 80
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: backend-flask-horizontal-autoscaler
  namespace: "default"
  labels:
    app: "backend-flask"
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       backend-flask
  updatePolicy:
    updateMode: "Auto"
---
apiVersion: "v1"
kind: "Service"
metadata:
  name: "backend-flask-service"
  namespace: "default"
  labels:
    app: "backend-flask"
spec:
  ports:
  - protocol: "TCP"
    port: 5000
    targetPort: 5000
  selector:
    app: "backend-flask"
  type: "LoadBalancer"
  loadBalancerIP: ""

Upvotes: 0

Views: 532

Answers (1)

masus04
masus04

Reputation: 983

Turns out it really was a quota issue. For some reason the quotas constantly showed more instances that I was actually using at the time.

Increasing quotas only took effect after deleting & recreating the cluster.

Finally, my onw autoscalers messed with my deployment as I was using the specified resources in between requests.

Thank you for your answers @GariSingh. I was also able to deploy up to 24 CPUs once I removed the autoscalers and increased quotas.

Upvotes: 0

Related Questions