dgebert
dgebert

Reputation: 1484

Actuator Health Endpoint returns OUT_OF_SERVICE, when all groups are UP

I am trying to set up readiness probe for my app deployed to k8s but under the actuator/health endpoint I am getting different status, comparing to actuator/health/readiness endpoint.

It's important, that this behaviour is only observed when the app is deployed to k8s cluster.

So without any additional config in the application.properties file I am getting:

➜  ~ curl localhost:8080/actuator/health
{"status":"OUT_OF_SERVICE","groups":["liveness","readiness"]}%
➜  ~ curl localhost:8080/actuator/health/liveness
{"status":"UP"}%
➜  ~ curl localhost:8080/actuator/health/readiness
{"status":"OUT_OF_SERVICE"}%

Which seems to be correct - if the readiness state is OUT_OF_SERVICE, the health endpoint returns OUT_OF_SERVICE as well, because it includes readiness group. This is at least consistent.

On the other hand, when I specify what should be included in the readiness group in the application.properties file it seems to be reporting inconsistent results. In my case I've added one entry to my configuration file, which is: management.endpoint.health.group.readiness.include=ping

This time that's what I have as a result of sending the same set of requests as before:

➜  ~ curl localhost:8080/actuator/health
{"status":"OUT_OF_SERVICE","groups":["liveness","readiness"]}%
➜  ~ curl localhost:8080/actuator/health/liveness
{"status":"UP"}%
➜  ~ curl localhost:8080/actuator/health/readiness
{"status":"UP"}%

This is inconsistent - when both liveness and readiness endpoint return status UP I'd expect to see the same status in the health endpoint.

I am looking for an explanation what I have misconfigured here, and why it works that way.

To make it easier, I've created a small app, where you can verify this behaviour on your cluster: https://github.com/gebertdominik/actuator-bug

Upvotes: 5

Views: 5143

Answers (1)

Andy Wilkinson
Andy Wilkinson

Reputation: 116111

As described in the documentation, the application is not ready to handle traffic until application and command-line runners have been called. Your command-line runner that calls your EventConsumer never returns so the application is never considered ready to handle traffic.

It's easier to see the effect that this has if you configure the health endpoint to always show details:

management.endpoint.health.show-details=always

The health endpoint now shows all of the individual components that are aggregated to produce the overall health:

{
    "components": {
        "diskSpace": {
            "details": {
                "exists": true,
                "free": 465064448000,
                "threshold": 10485760,
                "total": 1000240963584
            },
            "status": "UP"
        },
        "livenessState": {
            "status": "UP"
        },
        "ping": {
            "status": "UP"
        },
        "readinessState": {
            "status": "OUT_OF_SERVICE"
        }
    },
    "groups": [
        "liveness",
        "readiness"
    ],
    "status": "OUT_OF_SERVICE"
}

OUT_OF_SERVICE is returned due to the status of the readinessState component.

In its default configuration, readinessState is used by the readiness group and it too returns OUT_OF_SERVICE. By setting management.endpoint.health.group.readiness.include=ping, you have created your own custom readiness group that only includes the ping component. It now returns UP, which is consistent with the status of the ping component in the overall health response. As shown in the documentation you should include readinessState when customizing the readinessGroup:

management.endpoint.health.group.readiness.include=readinessState,ping

Upvotes: 4

Related Questions