Reputation: 3535
Problem:
Running helmfile apply
seems to re-create the deployment causing downtime, even though the update strategy is RollingUpdate
.
While running helmfile apply
I can observe that the deployment at first shows Ready 2/2
containers (before apply
), then, for several seconds it shows Ready 0/0
, after that Ready 1/2
and finally Ready 2/2
.
The weird situation with Ready 0/0
made me think that the whole deployment object gets re-created for some reason, which causes downtime.
Context:
My CD pipeline runs helmfile apply
to deploy the app to the cluster.
The app consists of multiple deployments.
When running helmfile diff
, I can see that the only thing that gets changed is the image in the deployment's pod spec. However, it seems that the whole deployment gets re-created, and I'm not sure what can cause this behaviour and how to debug it.
I'm going to list all the info I have below:
uid
label on my deployment changes between re-deploys (not sure if it's relevant since they might represent different revisions).message
)LAST SEEN TYPE REASON OBJECT MESSAGE
15m Normal Killing Pod/my-api-7fc5fffdff-x7mqs Stopping container my-api
15m Normal ScalingReplicaSet Deployment/my-api Scaled up replica set my-api-9bd587876 to 1
15m Normal Killing Pod/my-api-7fc5fffdff-cpxt8 Stopping container my-api
15m (x10 over 24h) Normal NoPods PodDisruptionBudget/api-pdb-dev No matching pods found
15m (x5 over 24h) Normal SuccessfulRescale HorizontalPodAutoscaler/api-hpa-dev New size: 2; reason: Current number of replicas below Spec.MinReplicas
15m Normal ScalingReplicaSet Deployment/my-api Scaled up replica set my-api-9bd587876 to 2 from 1
14m Normal SuccessfulCreate ReplicaSet/my-api-9bd587876 Created pod: my-api-9bd587876-qcq52
14m Normal Scheduled Pod/my-api-9bd587876-qcq52 Successfully assigned dev/my-api-9bd587876-qcq52 to ip-10-0-31-232.ec2.internal
14m Normal Created Pod/my-api-9bd587876-qcq52 Created container my-api
14m Normal Pulled Pod/my-api-9bd587876-qcq52 Container image "repos-registry.example:my-api-1526051753" already present on machine
14m Normal Started Pod/my-api-9bd587876-qcq52 Started container my-api
14m (x12 over 24h) Warning FailedComputeMetricsReplicas HorizontalPodAutoscaler/api-hpa-dev invalid metrics (2 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m (x12 over 24h) Warning FailedGetResourceMetric HorizontalPodAutoscaler/api-hpa-dev failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API
14m Warning Unhealthy Pod/my-api-9bd587876-qcq52 Startup probe failed: Get "http://10.0.31.252:3000/healthz": dial tcp 10.0.31.252:3000: connect: connection refused
14m Warning FailedComputeMetricsReplicas HorizontalPodAutoscaler/api-hpa-dev invalid metrics (1 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m (x13 over 24h) Warning FailedGetResourceMetric HorizontalPodAutoscaler/api-hpa-dev failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
14m Normal Pulled Pod/my-api-9bd587876-nqpxf Container image "repos-registry.example:my-api-1526051753" already present on machine
14m Normal SuccessfulCreate ReplicaSet/my-api-9bd587876 Created pod: my-api-9bd587876-nqpxf
14m Normal Scheduled Pod/my-api-9bd587876-nqpxf Successfully assigned dev/my-api-9bd587876-nqpxf to ip-10-0-31-232.ec2.internal
14m Normal Created Pod/my-api-9bd587876-nqpxf Created container my-api
14m Normal Started Pod/my-api-9bd587876-nqpxf Started container my-api
14m (x6 over 24h) Warning FailedComputeMetricsReplicas HorizontalPodAutoscaler/api-hpa-dev invalid metrics (1 invalid out of 2), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
14m (x12 over 24h) Warning FailedGetResourceMetric HorizontalPodAutoscaler/api-hpa-dev failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
13m (x2 over 14m) Warning Unhealthy Pod/my-api-9bd587876-nqpxf Startup probe failed: Get "http://10.0.31.11:3000/healthz": dial tcp 10.0.31.11:3000: connect: connection refused
readinessProbe:
httpGet:
path: /ready
port: http
failureThreshold: 2
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: http
failureThreshold: 2
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 10
startupProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 30
successThreshold: 1
timeoutSeconds: 10
terminationGracePeriodSeconds: 45
set up. strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
minReplicas: 2
maxReplicas: 2
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 90
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 90
minAvailable: 1
.Could this be the issue with probes? Or is it something else entirely?
As David Maze suggested in the comments, the Ready 0/0
is just a presentation issue.
Below are the outputs of the --watch
commands during a redeploy:
kubectl get deploy my-api --watch
:NAME READY UP-TO-DATE AVAILABLE AGE
my-api 2/2 2 2 3h20m
my-api 2/2 2 2 3h22m
my-api 0/1 0 0 0s
my-api 0/1 0 0 0s
my-api 0/1 0 0 0s
my-api 0/2 0 0 0s
my-api 0/2 0 0 0s
my-api 0/2 0 0 0s
my-api 1/2 1 1 90s
my-api 1/2 2 1 90s
my-api 2/2 2 2 116s
kubectl get pod -l app.kubernetes.io/component=my-api --watch
:NAME READY STATUS RESTARTS AGE
my-api-9bd587876-nqpxf 1/1 Running 0 3h21m
my-api-9bd587876-qcq52 1/1 Running 0 3h21m
my-api-9bd587876-qcq52 1/1 Terminating 0 3h21m
my-api-9bd587876-nqpxf 1/1 Terminating 0 3h21m
my-api-9bd587876-qcq52 0/1 Terminating 0 3h21m
my-api-9bd587876-nqpxf 0/1 Terminating 0 3h21m
my-api-9bd587876-qcq52 0/1 Terminating 0 3h21m
my-api-9bd587876-qcq52 0/1 Terminating 0 3h21m
my-api-9bd587876-nqpxf 0/1 Terminating 0 3h21m
my-api-9bd587876-nqpxf 0/1 Terminating 0 3h21m
my-api-5d7df89fdf-nhxp5 0/1 Pending 0 0s
my-api-5d7df89fdf-nhxp5 0/1 Pending 0 0s
my-api-5d7df89fdf-nhxp5 0/1 ContainerCreating 0 0s
my-api-5d7df89fdf-nhxp5 0/1 Running 0 1s
my-api-5d7df89fdf-nhxp5 0/1 Running 0 26s
my-api-5d7df89fdf-nhxp5 1/1 Running 0 26s
my-api-5d7df89fdf-rbknz 0/1 Pending 0 0s
my-api-5d7df89fdf-rbknz 0/1 Pending 0 0s
my-api-5d7df89fdf-rbknz 0/1 ContainerCreating 0 0s
my-api-5d7df89fdf-rbknz 0/1 Running 0 2s
my-api-5d7df89fdf-rbknz 0/1 Running 0 26s
my-api-5d7df89fdf-rbknz 1/1 Running 0 26s
This still, however, doesn't really look like a rolling update.
Upvotes: 1
Views: 89