Kubernetes: Load Balancer vs Readiness health check

Question

I'm running a WebService backend application in Kubernetes (GKE). It is used only by our frontend Web app. Typically there are sequences of tens of requests coming from the same user (ClientIP). My app is set up to run at least 2 instances ("minReplicas: 2").

The problem: From logs I can see situations when one pod is overloaded (receiving many requests) while the other is idle. Both pods being in Ready state.

My attempt to fix it: I tried to add a custom Readiness health check that returns "Unhealthy" status when there is too many open connections. But even after the health check returned "Unhealthy", load balancer sends further requests to the same pod while the second (healthy) pod is idle.

Here is an excerpt from service.yaml:

kind: Service
metadata:
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
spec:
  type: LoadBalancer
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

sessionAffinity is not specified so I expect it is "None"

My questions: What am I doing wrong? Has the Readiness health check any effect on load balancer? How can I control requests distribution?

Additional information:

Cluster creation:

gcloud container --project %PROJECT% clusters create %CLUSTER% 
  --zone "us-east1-b" --release-channel "stable" --machine-type "n1-standard-2" 
  --disk-type "pd-ssd" --disk-size "20" --metadata disable-legacy-endpoints=true 
  --scopes "storage-rw" --num-nodes "1" --enable-stackdriver-kubernetes 
  --enable-ip-alias --network "xxx" --subnetwork "xxx" 
  --cluster-secondary-range-name "xxx" --services-secondary-range-name "xxx" 
  --no-enable-master-authorized-networks

Node Pool:

gcloud container node-pools create XXX --project %PROJECT% --zone="us-east1-b" 
  --cluster=%CLUSTER% --machine-type=c2-standard-4 --max-pods-per-node=16 
  --num-nodes=1 --disk-type="pd-ssd" --disk-size="10" --scopes="storage-full" 
  --enable-autoscaling --min-nodes=1 --max-nodes=30

Service:

apiVersion: v1
kind: Service
metadata:
  name: XXX
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
  labels:
    app: XXX
    version: v0.1
spec:
  selector:
    app: XXX
    version: v0.1
  type: LoadBalancer
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

HPA:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: XXX
spec:
  scaleTargetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       XXX
  minReplicas: 2
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 40
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: XXX
  labels:
    app: XXX
    version: v0.1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: XXX
      version: v0.1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

  template:
    metadata:
      labels:
        app: XXX
        version: v0.1
    spec:
      containers:
      - image: XXX
        name: XXX
        imagePullPolicy: Always        
        resources:
          requests:
            memory: "10Gi"
            cpu: "3200m"
          limits:
            memory: "10Gi"
            cpu: "3600m"
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 8
          failureThreshold: 3                        
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 120
          periodSeconds: 30   
      nodeSelector:
        cloud.google.com/gke-nodepool: XXX

NAME	Number of requests
nginx-7db7cf7c77-4ttqb	~33454
nginx-7db7cf7c77-dtwc8	~33208
nginx-7db7cf7c77-r6wv2	~33338

Kubernetes: Load Balancer vs Readiness health check

Answers (1)

Related Questions