Jorrit Salverda
Jorrit Salverda

Reputation: 765

Kubernetes deployment incurs downtime

When running a deployment I get downtime. Requests failing after a variable amount of time (20-40 seconds).

The readiness check for the entry container fails when the preStop sends SIGUSR1, waits for 31 seconds, then sends SIGTERM. In that timeframe the pod should be removed from the service as the readiness check is set to fail after 2 failed attempts with 5 second intervals.

How can I see the events for pods being added and removed from the service to find out what's causing this?

And events around the readiness checks themselves?

I use Google Container Engine version 1.2.2 and use GCE's network load balancer.

service:

apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: http
    protocol: TCP
  - name: https
    port: 443
    targetPort: https
    protocol: TCP  
  selector:
    app: myapp

deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: 1.0.0-61--66-6
    spec:
      containers:
      - name: myapp
        image: ****  
        resources:
          limits:
            cpu: 100m
            memory: 250Mi
          requests:
            cpu: 10m
            memory: 125Mi
        ports:
        - name: http-direct
          containerPort: 5000
        livenessProbe:
          httpGet:
            path: /status
            port: 5000
          initialDelaySeconds: 30
          timeoutSeconds: 1
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["sleep 31;"]
      - name: haproxy
        image: travix/haproxy:1.6.2-r0
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 100m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 25Mi
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        env:
        - name: "SSL_CERTIFICATE_NAME"
          value: "ssl.pem"         
        - name: "OFFLOAD_TO_PORT"
          value: "5000"
        - name: "HEALT_CHECK_PATH"
          value: "/status"
        volumeMounts:
        - name: ssl-certificate
          mountPath: /etc/ssl/private
        livenessProbe:
          httpGet:
            path: /status
            port: 443
            scheme: HTTPS
          initialDelaySeconds: 30
          timeoutSeconds: 1
        readinessProbe:
          httpGet:
            path: /readiness
            port: 81
          initialDelaySeconds: 0
          timeoutSeconds: 1
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 2
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["kill -USR1 1; sleep 31; kill 1"]
      volumes:
      - name: ssl-certificate
        secret:
          secretName: ssl-c324c2a587ee-20160331

Upvotes: 0

Views: 695

Answers (1)

janetkuo
janetkuo

Reputation: 2835

When the probe fails, the prober will emit a warning event with reason as Unhealthy and message as xx probe errored: xxx.

You should be able to find those events using either kubectl get events or kubectl describe pods -l app=myapp,version=1.0.0-61--66-6 (filter pods by its label).

Upvotes: 1

Related Questions