How to defrag resource utilization of GKE node with HPA and Cluster Autoscaler

Question

Using HPA (Horizontal Pod Autoscaler) and Cluster Autoscaler on GKE, pods and nodes are scaled up as expected. However, when demand decrease, pods are deleted from random nodes, it seems. It causes less utilized nodes. It is not cost effective...

EDIT: HPA is based on targetCPUUtilizationPercentage single metrics. Not using VPA.

This is reducted yaml file for deployment and HPA:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo
spec:
  replicas: 1
  templates:
    spec:
      containers:
      - name: c1
        resources:                                                                                                             
          requests:                                                                                                            
            cpu: 200m                                                                                                          
            memory: 1.2G                                                                                                       
      - name: C2
        resources:                                                                                                             
          requests:                                                                                                            
            cpu: 10m                                                                                                           
        volumeMounts:                                                                                                          
        - name: log-share                                                                                                      
          mountPath: /mnt/log-share                                                                                            
      - name: C3
        resources:
          requests:
            cpu: 10m
          limits:
            cpu: 100m
        - name: log-share                                                                                                      
          mountPath: /mnt/log-share                                                                                            
      volumes:
      - name: log-share
        emptyDir: {}

---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: foo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: foo
  minReplicas: 1
  maxReplicas: 60
  targetCPUUtilizationPercentage: 80
...

EDIT2: Add an emptyDir volume to be valid example.

How do I Improve this situation?

There are some ideas, but none of them solve the issue completely...

configure node pool machine type and pod resource request so that only one pod fit on a node. If a pod is deleted from a node by HPA, the node will be deleted after a period, but it doesn't work for deployments of various resource requests.
using preemptive nodes if possible...

hiroshi · Accepted Answer

Sorry, I failed to mention about use of emptyDir (edited yaml in the question).

As I commented on the question myself, I found What types of pods can prevent CA from removing a node? in the Autoscaler FAQ.

Pods with local storage. *

An emptyDir volume is a local storage, So I needed to add following annotation in the pod template of a deployment to mark the pod is safe to evict from less utilized nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo
spec:
  selector:
    matchLabels:
      app: foo
  template:
    metadata:
      labels:
        app: foo
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    spec:
      ...

After specifying the annotation, the size of GCE instance group of the GKE node pool is smaller than before. I think it worked!

Thank you for everyone commented in the question!

How to defrag resource utilization of GKE node with HPA and Cluster Autoscaler

Answers (1)

Related Questions