Shivraj
Shivraj

Reputation: 492

pod readinessprobe issue with database and container

I have an application deployed to kubernetes. Here is techstack: Java 11, Spring Boot 2.3.x or 2.5.x, using hikari 3.x or 4.x

Using spring actuator to do healthcheck. Here is liveness and readiness configuration within application.yaml:

  endpoint:
    health:
      group:
        liveness:
          include: '*'
          exclude:
            - db
            - readinessState
        readiness:
          include: '*'

what it does if DB is down -

  1. Makes sure liveness doesn't get impacted - meaning, application container should keep on running even if there is DB outage.
  2. readinesss will be impacted making sure no traffic is allowed to hit the container.

liveness and readiness configuration in container spec:

livenessProbe:
  httpGet:
      path: actuator/health/liveness
      port: 8443
      scheme: HTTPS
  initialDelaySeconds: 30
  periodSeconds: 30
  timeoutSeconds: 5
readinessProbe:
  httpGet:
      path: actuator/health/readiness
      port: 8443
      scheme: HTTPS
  initialDelaySeconds: 30
  periodSeconds: 30
  timeoutSeconds: 20

My application is started and running fine for few hours.

What I did:

I brought down DB.

Issue Noticed:

When DB is down, after 90+ seconds I see 3 more pods getting spinned up. When a pod is described I see Status and condition like below:

Status:       Running
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True

when I list all running pods:

NAME                                                  READY   STATUS    RESTARTS   AGE
application-a-dev-deployment-success-5d86b4bcf4-7lsqx    0/1     Running   0          6h48m
application-a-dev-deployment-success-5d86b4bcf4-cmwd7    0/1     Running   0          49m
application-a-dev-deployment-success-5d86b4bcf4-flf7r    0/1     Running   0          48m
application-a-dev-deployment-success-5d86b4bcf4-m5nk7    0/1     Running   0          6h48m
application-a-dev-deployment-success-5d86b4bcf4-tx4rl    0/1     Running   0          49m

My Analogy/Finding:

Per ReadinessProbe configuration: periodSeconds is set to 30 seconds and failurethreshold is defaulted to 3 per k8s documentation.

Per application.yaml readiness includes db check, meaning after every 30 seconds readiness check failed. When it fails 3 times, failurethreshold is met and it spins up new pods.

Restart policy is default to Always.

Questions:

  1. Why it spinned new pods?
  2. Why it spinned specifically only 3 pods and not 1 or 2 or 4 or any number?
  3. Does this has to do anything with restartpolicy?

Upvotes: 2

Views: 3370

Answers (3)

ajay
ajay

Reputation: 1

Readiness probe doesn't restart pod but just marks its Ready state to false.

failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

Saying that, you should also consider Liveness probe which actually restart pod at similar situation. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#:~:text=Play%20with%20Kubernetes-,Define%20a%20liveness%20command,-Many%20applications%20running

Upvotes: 0

Shivraj
Shivraj

Reputation: 492

The crux lied in HPA. CPU utilization of POD after readiness failure used to jump up and as it was going above 70% HPA was getting triggered and started those 3 pods.

Upvotes: 0

Bazhikov
Bazhikov

Reputation: 841

  1. As you answered to yourself, it spinned new pods after 3 times tries according to failureThreshold. You can change your restartPolicy to OnFailure, it will allow you to restart the job only if it fails or Never if you don't want have the cluster to be restarted. The difference between the statuses you can find here. Note this:

The restartPolicy applies to all containers in the Pod. restartPolicy only refers to restarts of the containers by the kubelet on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.

  1. Share your full Deployment file, I suppose that you've set replicas number to 3.

  2. Answered in the answer for the 1st question.

Also note this, if this works for you:

Startup probes are useful for Pods that have containers that take a long time to come into service. Rather than set a long liveness interval, you can configure a separate configuration for probing the container as it starts up, allowing a time longer than the liveness interval would allow.

If your container usually starts in more than initialDelaySeconds + failureThreshold × periodSeconds, you should specify a startup probe that checks the same endpoint as the liveness probe. The default for periodSeconds is 10s. You should then set its failureThreshold high enough to allow the container to start, without changing the default values of the liveness probe. This helps to protect against deadlocks.

Upvotes: 1

Related Questions