Reputation: 3585
I create a Pod with Replica count of say 2, which runs an application ( a simple web-server), basically it's always running command - However due to mis-configuration, sometimes the command exits and the pod is then Terminated.
Due to default restartPolicy
of Always
the pod (and hence the container) is restarted and eventually the Pod status is CrashLoopBackOff
.
If I do a kubectl describe deployment
, it shows Condition as Progressing=True
and Available=False
.
This looks fine - the question is - how do I mark my deployment as 'failed' in the above case?
Adding a spec.ProgressDeadlineSeconds
doesn't seem to be having an effect.
Will simply saying restartPolicy
as Never
be enough in the Pod specification?
A related question, is there a way of getting this information as a trigger/webhook, without doing a rollout status
watch?
Upvotes: 2
Views: 12697
Reputation: 22148
Regarding your question:
How do I mark my deployment as 'failed' in the above case?
Kubernetes gives you two types of health checks:
1 ) Readiness
Readiness probes are designed to let Kubernetes know when your app is ready to serve traffic.
Kubernetes makes sure the readiness probe passes before allowing a service to send traffic to the pod.
If a readiness probe starts to fail, Kubernetes stops sending traffic to the pod until it passes.
2 ) Liveness
Liveness probes let Kubernetes know if your app is alive or dead.
If you app is alive, then Kubernetes leaves it alone. If your app is dead, Kubernetes removes the Pod and starts a new one to replace it.
At the moment (v1.19.0) , Kubernetes has support for 3 types mechanisms for implementing liveness and readiness probes:
A ) ExecAction: Executes a specified command inside the container. The diagnostic is considered successful if the command exits with a status code of 0.
B ) TCPSocketAction: Performs a TCP check against the Pod's IP address on a specified port. The diagnostic is considered successful if the port is open.
C ) HTTPGetAction: Performs an HTTP GET request against the Pod's IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
I think that in your case (the need to refer to a deployment as succeed / failed and take the proper action) you should:
Step 1:
Setup a HTTP/TCP readiness Probe - for example:
readinessProbe:
httpGet:
path: /health-check
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Where:
initialDelaySeconds
— The number of seconds since the container has started before the readiness probe can be initiated.
periodSeconds
— How often to perform the readiness probe.
failureThreshold
— The number of tries to perform the readiness probe if the probe fails on pod start.
Step 2:
Choose the relevant rolling update strategy and how you should handle cases of failures of new pods (consider reading this thread for examples).
A few references you can follow:
Container probes
Kubernetes Liveness and Readiness Probes
Kubernetes : Configure Liveness and Readiness Probes
Kubernetes and Containers Best Practices - Health Probes
Creating Liveness Probes for your Node.js application in Kubernetes
A deployment (or the rollout process) will be considered as Failed
if it tries to deploy its newest ReplicaSet without ever completing over and over again until the progressDeadlineSeconds
interval has exceeded.
Then K8S you update the status with:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
ReplicaFailure True FailedCreate
Read more in here.
Upvotes: 4
Reputation: 4733
There is no Kubernetes concept for a "failed" deployment. Editing a deployment registers your intent that the new ReplicaSet is to be created, and k8s will repeatedly try to make that intent happen. Any errors that are hit along the way will cause the rollout to block, but they will not cause k8s to abort the deployment.
AFAIK, the best you can do (as of 1.9) is to apply a deadline to the Deployment, which will add a Condition that you can detect when a deployment gets stuck; see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment and https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds.
It's possible to overlay your own definitions of failure on top of the statuses that k8s provides, but this is quite difficult to do in a generic way; see this issue for a (long!) discussion on the current status of this: https://github.com/kubernetes/kubernetes/issues/1899
Here's some Python code (using pykube
) that I wrote a while ago that implements my own definition of ready; I abort my deploy script if this condition does not obtain after 5 minutes.
def _is_deployment_ready(d, deployment):
if not deployment.ready:
_log.debug('Deployment not completed.')
return False
if deployment.obj["status"]["replicas"] > deployment.replicas:
_log.debug('Old replicas not terminated.')
return False
selector = deployment.obj['spec']['selector']['matchLabels']
pods = Pod.objects(d.api).filter(namespace=d.namespace, selector=selector)
if not pods:
_log.info('No pods found.')
return False
for pod in pods:
_log.info('Is pod %s ready? %s.', pod.name, pod.ready)
if not pod.ready:
_log.debug('Pod status: %s', pod.obj['status'])
return False
_log.info('All pods ready.')
return True
Note the individual pod check, which is required because a deployment seems to be deemed 'ready' when the rollout completes (i.e. all pods are created), not when all of the pods are ready.
Upvotes: 1