Shardj
Shardj

Reputation: 1969

Is there a way to run docker healthchecks early?

I use the following bash code to ensure that a docker container is up and healthy before I continue. However my current healthcheck interval is set to 30 seconds, as a result, this loop will pretty much always wait for 30 seconds even if the container is actually ready after just 1 second. I don't want to lower the interval because the healthcheck uses curl and I don't want to spam my access_log. Ideally I'd like to do the first healthcheck after 5 seconds instead of 30, and then after that 30 second intervals between checks.

maxcounter=60
counter=1
# Wait until the container is launched and healthy before continuing
until [ "$(docker inspect -f {{.State.Health.Status}} $CONTAINER)" == "healthy" ]; do
    if [ $counter -gt $maxcounter ]; then
        echo "We have been waiting for the container ($CONTAINER) for too long already; failing."
        echo "\nContainer state:"
        docker inspect -f '{{json .State}}' $CONTAINER | jq
        echo "\nContainer logs:"
        docker logs $CONTAINER
        exit 1
    fi;
    sleep 1
    counter=$(expr $counter + 1)
done;

Upvotes: 1

Views: 1161

Answers (2)

Shardj
Shardj

Reputation: 1969

I personally decided to simply reduce my interval to 5 seconds instead of doing this because I want to keep the retry functionality. But here's a rough script you can have inside your docker container and set it to run for the healthcheck.

#!/bin/bash

function timedLock {
    lockfile-create /tmp/healthchecklock
    sleep 30
    lockfile-remove /tmp/healthchecklock
}

function healthcheckExit {
    # Write the response of the most recent healthcheck to a file
    echo -n $1 > /tmp/healthcheckCode
    exit $1
}

# If lock exists then we've already run a healthcheck within the last 30 seconds
if lockfile-check /tmp/healthchecklock; then
    # Since we don't want to do a healthcheck we simply respond with the last exit code
    exit $(cat /tmp/healthcheckCode)
fi

# Create a lockfile so that we don't run another check for the next 30 seconds
timedLock&
# If the commands fail then we exit 1 (unhealthy), if they don't fail we exit 0 (healthy)
if cgi-fcgi -bind -connect localhost:9001 && curl --fail "http://localhost:80$HEALTHCHECK_PATH"; then
    healthcheckExit 0
else
    healthcheckExit 1
fi

Upvotes: 0

Aaron
Aaron

Reputation: 24812

I don't believe you can have a variable healthcheck interval, so I would have the healthcheck command called every five seconds and let it decide whether it should skip its actual code or execute it.

If you can use the procmail package's lockfile, I would implement it this way :

lockfile -0 -r 0 -l 30 /tmp/healthlock || exit 0
<actual healthcheck code>

That lockfile command will create a lockfile /tmp/healthlock that will remain valid for 30 seconds.

The first time the script is called there should be no such file, so the command will return with success and you will carry on with your actual healthcheck code.

The next 5 times (10-30 seconds after container startup, 5-25 seconds after the first lock's creation) the lockfile will exist and remain valid, so the lockfile command will return a non-0 exit code and the script will immediately exit.

The next time (35 seconds after container startup, 30 seconds after the first lock's creation) the first lock will be perempted and the lockfile command will create a new one valid for another 30 seconds and will return a 0 exit code, letting the rest of your code execute.

Note that a problem of that solution will be that when the healthcheck command exits due to the lock, its exit code will still be taken into account for the health status. I don't know if you can access the previously reported health status from the healthcheck command, but if that's possible it would be better to use it as the exit code when the lockfile is present in order not to erroneously report an healthy status when no healthcheck was actually attempted.

Upvotes: 1

Related Questions