Denis Yakovenko
Denis Yakovenko

Reputation: 3535

helmfile apply re-creates the deployment

Problem:

Running helmfile apply seems to re-create the deployment causing downtime, even though the update strategy is RollingUpdate.

While running helmfile apply I can observe that the deployment at first shows Ready 2/2 containers (before apply), then, for several seconds it shows Ready 0/0, after that Ready 1/2 and finally Ready 2/2.

The weird situation with Ready 0/0 made me think that the whole deployment object gets re-created for some reason, which causes downtime.


Context:

My CD pipeline runs helmfile apply to deploy the app to the cluster. The app consists of multiple deployments.

When running helmfile diff, I can see that the only thing that gets changed is the image in the deployment's pod spec. However, it seems that the whole deployment gets re-created, and I'm not sure what can cause this behaviour and how to debug it.

I'm going to list all the info I have below:

LAST SEEN            TYPE      REASON                         OBJECT                                                    MESSAGE
15m                  Normal    Killing                        Pod/my-api-7fc5fffdff-x7mqs                               Stopping container my-api
15m                  Normal    ScalingReplicaSet              Deployment/my-api                                         Scaled up replica set my-api-9bd587876 to 1
15m                  Normal    Killing                        Pod/my-api-7fc5fffdff-cpxt8                               Stopping container my-api
15m (x10 over 24h)   Normal    NoPods                         PodDisruptionBudget/api-pdb-dev                           No matching pods found
15m (x5 over 24h)    Normal    SuccessfulRescale              HorizontalPodAutoscaler/api-hpa-dev                       New size: 2; reason: Current number of replicas below Spec.MinReplicas
15m                  Normal    ScalingReplicaSet              Deployment/my-api                                         Scaled up replica set my-api-9bd587876 to 2 from 1
14m                  Normal    SuccessfulCreate               ReplicaSet/my-api-9bd587876                               Created pod: my-api-9bd587876-qcq52
14m                  Normal    Scheduled                      Pod/my-api-9bd587876-qcq52                                Successfully assigned dev/my-api-9bd587876-qcq52 to ip-10-0-31-232.ec2.internal
14m                  Normal    Created                        Pod/my-api-9bd587876-qcq52                                Created container my-api
14m                  Normal    Pulled                         Pod/my-api-9bd587876-qcq52                                Container image "repos-registry.example:my-api-1526051753" already present on machine
14m                  Normal    Started                        Pod/my-api-9bd587876-qcq52                                Started container my-api
14m (x12 over 24h)   Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler/api-hpa-dev                       invalid metrics (2 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m (x12 over 24h)   Warning   FailedGetResourceMetric        HorizontalPodAutoscaler/api-hpa-dev                       failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API
14m                  Warning   Unhealthy                      Pod/my-api-9bd587876-qcq52                                Startup probe failed: Get "http://10.0.31.252:3000/healthz": dial tcp 10.0.31.252:3000: connect: connection refused
14m                  Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler/api-hpa-dev                       invalid metrics (1 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m (x13 over 24h)   Warning   FailedGetResourceMetric        HorizontalPodAutoscaler/api-hpa-dev                       failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m                  Normal    Pulled                         Pod/my-api-9bd587876-nqpxf                                Container image "repos-registry.example:my-api-1526051753" already present on machine
14m                  Normal    SuccessfulCreate               ReplicaSet/my-api-9bd587876                               Created pod: my-api-9bd587876-nqpxf
14m                  Normal    Scheduled                      Pod/my-api-9bd587876-nqpxf                                Successfully assigned dev/my-api-9bd587876-nqpxf to ip-10-0-31-232.ec2.internal
14m                  Normal    Created                        Pod/my-api-9bd587876-nqpxf                                Created container my-api
14m                  Normal    Started                        Pod/my-api-9bd587876-nqpxf                                Started container my-api
14m (x6 over 24h)    Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler/api-hpa-dev                       invalid metrics (1 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
14m (x12 over 24h)   Warning   FailedGetResourceMetric        HorizontalPodAutoscaler/api-hpa-dev                       failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
13m (x2 over 14m)    Warning   Unhealthy                      Pod/my-api-9bd587876-nqpxf                                Startup probe failed: Get "http://10.0.31.11:3000/healthz": dial tcp 10.0.31.11:3000: connect: connection refused
readinessProbe:
  httpGet:
    path: /ready
    port: http
  failureThreshold: 2
  initialDelaySeconds: 10
  periodSeconds: 5
  successThreshold: 1
  timeoutSeconds: 10
livenessProbe:
  httpGet:
    path: /healthz
    port: http
  failureThreshold: 2
  initialDelaySeconds: 10
  periodSeconds: 5
  successThreshold: 1
  timeoutSeconds: 10
startupProbe:
  httpGet:
    path: /healthz
    port: http
  initialDelaySeconds: 15
  periodSeconds: 5
  failureThreshold: 30
  successThreshold: 1
  timeoutSeconds: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  minReplicas: 2
  maxReplicas: 2
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 90
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 90

Could this be the issue with probes? Or is it something else entirely?


UPDATE #1

As David Maze suggested in the comments, the Ready 0/0 is just a presentation issue. Below are the outputs of the --watch commands during a redeploy:

NAME     READY   UP-TO-DATE   AVAILABLE   AGE
my-api   2/2     2            2           3h20m
my-api   2/2     2            2           3h22m
my-api   0/1     0            0           0s
my-api   0/1     0            0           0s
my-api   0/1     0            0           0s
my-api   0/2     0            0           0s
my-api   0/2     0            0           0s
my-api   0/2     0            0           0s
my-api   1/2     1            1           90s
my-api   1/2     2            1           90s
my-api   2/2     2            2           116s
NAME                      READY   STATUS              RESTARTS   AGE
my-api-9bd587876-nqpxf    1/1     Running             0          3h21m
my-api-9bd587876-qcq52    1/1     Running             0          3h21m
my-api-9bd587876-qcq52    1/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    1/1     Terminating         0          3h21m
my-api-9bd587876-qcq52    0/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    0/1     Terminating         0          3h21m
my-api-9bd587876-qcq52    0/1     Terminating         0          3h21m
my-api-9bd587876-qcq52    0/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    0/1     Terminating         0          3h21m
my-api-9bd587876-nqpxf    0/1     Terminating         0          3h21m
my-api-5d7df89fdf-nhxp5   0/1     Pending             0          0s
my-api-5d7df89fdf-nhxp5   0/1     Pending             0          0s
my-api-5d7df89fdf-nhxp5   0/1     ContainerCreating   0          0s
my-api-5d7df89fdf-nhxp5   0/1     Running             0          1s
my-api-5d7df89fdf-nhxp5   0/1     Running             0          26s
my-api-5d7df89fdf-nhxp5   1/1     Running             0          26s
my-api-5d7df89fdf-rbknz   0/1     Pending             0          0s
my-api-5d7df89fdf-rbknz   0/1     Pending             0          0s
my-api-5d7df89fdf-rbknz   0/1     ContainerCreating   0          0s
my-api-5d7df89fdf-rbknz   0/1     Running             0          2s
my-api-5d7df89fdf-rbknz   0/1     Running             0          26s
my-api-5d7df89fdf-rbknz   1/1     Running             0          26s

This still, however, doesn't really look like a rolling update.

Upvotes: 1

Views: 89

Answers (0)

Related Questions