Reputation: 1484
I am trying to set up readiness probe for my app deployed to k8s but under the actuator/health
endpoint I am getting different status, comparing to actuator/health/readiness
endpoint.
It's important, that this behaviour is only observed when the app is deployed to k8s cluster.
So without any additional config in the application.properties
file I am getting:
➜ ~ curl localhost:8080/actuator/health
{"status":"OUT_OF_SERVICE","groups":["liveness","readiness"]}%
➜ ~ curl localhost:8080/actuator/health/liveness
{"status":"UP"}%
➜ ~ curl localhost:8080/actuator/health/readiness
{"status":"OUT_OF_SERVICE"}%
Which seems to be correct - if the readiness state is OUT_OF_SERVICE
, the health endpoint returns OUT_OF_SERVICE
as well, because it includes readiness
group. This is at least consistent.
On the other hand, when I specify what should be included in the readiness
group in the application.properties
file it seems to be reporting inconsistent results. In my case I've added one entry to my configuration file, which is: management.endpoint.health.group.readiness.include=ping
This time that's what I have as a result of sending the same set of requests as before:
➜ ~ curl localhost:8080/actuator/health
{"status":"OUT_OF_SERVICE","groups":["liveness","readiness"]}%
➜ ~ curl localhost:8080/actuator/health/liveness
{"status":"UP"}%
➜ ~ curl localhost:8080/actuator/health/readiness
{"status":"UP"}%
This is inconsistent - when both liveness
and readiness
endpoint return status UP
I'd expect to see the same status in the health
endpoint.
I am looking for an explanation what I have misconfigured here, and why it works that way.
To make it easier, I've created a small app, where you can verify this behaviour on your cluster: https://github.com/gebertdominik/actuator-bug
Upvotes: 5
Views: 5143
Reputation: 116111
As described in the documentation, the application is not ready to handle traffic until application and command-line runners have been called. Your command-line runner that calls your EventConsumer
never returns so the application is never considered ready to handle traffic.
It's easier to see the effect that this has if you configure the health endpoint to always show details:
management.endpoint.health.show-details=always
The health endpoint now shows all of the individual components that are aggregated to produce the overall health:
{
"components": {
"diskSpace": {
"details": {
"exists": true,
"free": 465064448000,
"threshold": 10485760,
"total": 1000240963584
},
"status": "UP"
},
"livenessState": {
"status": "UP"
},
"ping": {
"status": "UP"
},
"readinessState": {
"status": "OUT_OF_SERVICE"
}
},
"groups": [
"liveness",
"readiness"
],
"status": "OUT_OF_SERVICE"
}
OUT_OF_SERVICE
is returned due to the status of the readinessState
component.
In its default configuration, readinessState
is used by the readiness
group and it too returns OUT_OF_SERVICE
. By setting management.endpoint.health.group.readiness.include=ping
, you have created your own custom readiness
group that only includes the ping
component. It now returns UP
, which is consistent with the status of the ping
component in the overall health response. As shown in the documentation you should include readinessState
when customizing the readinessGroup
:
management.endpoint.health.group.readiness.include=readinessState,ping
Upvotes: 4