helmfile apply re-creates the deployment

Question

Problem:

Running helmfile apply seems to re-create the deployment causing downtime, even though the update strategy is RollingUpdate.

While running helmfile apply I can observe that the deployment at first shows Ready 2/2 containers (before apply), then, for several seconds it shows Ready 0/0, after that Ready 1/2 and finally Ready 2/2.

The weird situation with Ready 0/0 made me think that the whole deployment object gets re-created for some reason, which causes downtime.

Context:

My CD pipeline runs helmfile apply to deploy the app to the cluster. The app consists of multiple deployments.

When running helmfile diff, I can see that the only thing that gets changed is the image in the deployment's pod spec. However, it seems that the whole deployment gets re-created, and I'm not sure what can cause this behaviour and how to debug it.

I'm going to list all the info I have below:

the uid label on my deployment changes between re-deploys (not sure if it's relevant since they might represent different revisions).
When deploying, the order of events related to the deployment is the following (scroll to the right for the message)

LAST SEEN            TYPE      REASON                         OBJECT                                                    MESSAGE
15m                  Normal    Killing                        Pod/my-api-7fc5fffdff-x7mqs                               Stopping container my-api
15m                  Normal    ScalingReplicaSet              Deployment/my-api                                         Scaled up replica set my-api-9bd587876 to 1
15m                  Normal    Killing                        Pod/my-api-7fc5fffdff-cpxt8                               Stopping container my-api
15m (x10 over 24h)   Normal    NoPods                         PodDisruptionBudget/api-pdb-dev                           No matching pods found
15m (x5 over 24h)    Normal    SuccessfulRescale              HorizontalPodAutoscaler/api-hpa-dev                       New size: 2; reason: Current number of replicas below Spec.MinReplicas
15m                  Normal    ScalingReplicaSet              Deployment/my-api                                         Scaled up replica set my-api-9bd587876 to 2 from 1
14m                  Normal    SuccessfulCreate               ReplicaSet/my-api-9bd587876                               Created pod: my-api-9bd587876-qcq52
14m                  Normal    Scheduled                      Pod/my-api-9bd587876-qcq52                                Successfully assigned dev/my-api-9bd587876-qcq52 to ip-10-0-31-232.ec2.internal
14m                  Normal    Created                        Pod/my-api-9bd587876-qcq52                                Created container my-api
14m                  Normal    Pulled                         Pod/my-api-9bd587876-qcq52                                Container image "repos-registry.example:my-api-1526051753" already present on machine
14m                  Normal    Started                        Pod/my-api-9bd587876-qcq52                                Started container my-api
14m (x12 over 24h)   Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler/api-hpa-dev                       invalid metrics (2 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m (x12 over 24h)   Warning   FailedGetResourceMetric        HorizontalPodAutoscaler/api-hpa-dev                       failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API
14m                  Warning   Unhealthy                      Pod/my-api-9bd587876-qcq52                                Startup probe failed: Get "http://10.0.31.252:3000/healthz": dial tcp 10.0.31.252:3000: connect: connection refused
14m                  Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler/api-hpa-dev                       invalid metrics (1 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m (x13 over 24h)   Warning   FailedGetResourceMetric        HorizontalPodAutoscaler/api-hpa-dev                       failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m                  Normal    Pulled                         Pod/my-api-9bd587876-nqpxf                                Container image "repos-registry.example:my-api-1526051753" already present on machine
14m                  Normal    SuccessfulCreate               ReplicaSet/my-api-9bd587876                               Created pod: my-api-9bd587876-nqpxf
14m                  Normal    Scheduled                      Pod/my-api-9bd587876-nqpxf                                Successfully assigned dev/my-api-9bd587876-nqpxf to ip-10-0-31-232.ec2.internal
14m                  Normal    Created                        Pod/my-api-9bd587876-nqpxf                                Created container my-api
14m                  Normal    Started                        Pod/my-api-9bd587876-nqpxf                                Started container my-api
14m (x6 over 24h)    Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler/api-hpa-dev                       invalid metrics (1 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
14m (x12 over 24h)   Warning   FailedGetResourceMetric        HorizontalPodAutoscaler/api-hpa-dev                       failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
13m (x2 over 14m)    Warning   Unhealthy                      Pod/my-api-9bd587876-nqpxf                                Startup probe failed: Get "http://10.0.31.11:3000/healthz": dial tcp 10.0.31.11:3000: connect: connection refused

I have readiness/startup/liveness probes set up and they look like below:

readinessProbe:
  httpGet:
    path: /ready
    port: http
  failureThreshold: 2
  initialDelaySeconds: 10
  periodSeconds: 5
  successThreshold: 1
  timeoutSeconds: 10
livenessProbe:
  httpGet:
    path: /healthz
    port: http
  failureThreshold: 2
  initialDelaySeconds: 10
  periodSeconds: 5
  successThreshold: 1
  timeoutSeconds: 10
startupProbe:
  httpGet:
    path: /healthz
    port: http
  initialDelaySeconds: 15
  periodSeconds: 5
  failureThreshold: 30
  successThreshold: 1
  timeoutSeconds: 10

I also have terminationGracePeriodSeconds: 45 set up.
the rolling update setup for the deployment is below:

  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

there is an HPA set up that looks like below:

  minReplicas: 2
  maxReplicas: 2
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 90
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 90

there is also a PodDisruptionBudget object with minAvailable: 1.

Could this be the issue with probes? Or is it something else entirely?

UPDATE #1

As David Maze suggested in the comments, the Ready 0/0 is just a presentation issue. Below are the outputs of the --watch commands during a redeploy:

kubectl get deploy my-api --watch:

NAME     READY   UP-TO-DATE   AVAILABLE   AGE
my-api   2/2     2            2           3h20m
my-api   2/2     2            2           3h22m
my-api   0/1     0            0           0s
my-api   0/1     0            0           0s
my-api   0/1     0            0           0s
my-api   0/2     0            0           0s
my-api   0/2     0            0           0s
my-api   0/2     0            0           0s
my-api   1/2     1            1           90s
my-api   1/2     2            1           90s
my-api   2/2     2            2           116s

kubectl get pod -l app.kubernetes.io/component=my-api --watch:

NAME                      READY   STATUS              RESTARTS   AGE
my-api-9bd587876-nqpxf    1/1     Running             0          3h21m
my-api-9bd587876-qcq52    1/1     Running             0          3h21m
my-api-9bd587876-qcq52    1/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    1/1     Terminating         0          3h21m
my-api-9bd587876-qcq52    0/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    0/1     Terminating         0          3h21m
my-api-9bd587876-qcq52    0/1     Terminating         0          3h21m
my-api-9bd587876-qcq52    0/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    0/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    0/1     Terminating         0          3h21m
my-api-5d7df89fdf-nhxp5   0/1     Pending             0          0s
my-api-5d7df89fdf-nhxp5   0/1     Pending             0          0s
my-api-5d7df89fdf-nhxp5   0/1     ContainerCreating   0          0s
my-api-5d7df89fdf-nhxp5   0/1     Running             0          1s
my-api-5d7df89fdf-nhxp5   0/1     Running             0          26s
my-api-5d7df89fdf-nhxp5   1/1     Running             0          26s
my-api-5d7df89fdf-rbknz   0/1     Pending             0          0s
my-api-5d7df89fdf-rbknz   0/1     Pending             0          0s
my-api-5d7df89fdf-rbknz   0/1     ContainerCreating   0          0s
my-api-5d7df89fdf-rbknz   0/1     Running             0          2s
my-api-5d7df89fdf-rbknz   0/1     Running             0          26s
my-api-5d7df89fdf-rbknz   1/1     Running             0          26s

This still, however, doesn't really look like a rolling update.

helmfile apply re-creates the deployment

UPDATE #1

Answers (0)

Related Questions