user2366975
user2366975

Reputation: 4700

Docker swarm service not starting when healthcheck url points to itself

What could be the issue if traefik fails to run if the healthcheck url is pointing to itself? Using google it works, using the domain of traefik, the service is not even starting.

I followed the example from here: https://github.com/jakubhajek/traefik-swarm-mastery/blob/master/stack-tr-main.yml

version: "3.8"

services:
  traefik:
    image: "traefik:v2.4.7"
    healthcheck:
#      test: wget --quiet --tries=1 --spider https://google.com || exit 1 # WORKS
      test: wget --quiet --tries=1 --spider https://traefik.mydomain.com/ping || exit 1 # FAILS
      interval: 3s
      timeout: 5s
      retries: 3
      start_period: 5s
  deploy:
    replicas: 1
    placement:
      constraints: [node.role == manager]
    labels:
      - "traefik.enable=true"
      #___ROUTER
      - "traefik.http.routers.traefik.rule=Host(`traefik.mydomain.com`)"
      - "traefik.http.routers.traefik.entrypoints=websecure"
      - "traefik.http.routers.traefik.service=api@internal"
      - "traefik.http.routers.traefik.tls.certresolver=le"
      #___ROUTER ping
      - "traefik.http.routers.ping.rule=Host(`traefik.mydomain.com`) && Path(`/ping`)"
      - "traefik.http.routers.ping.service=ping@internal"
      - "traefik.http.routers.ping.tls.certresolver=le"
      #___ Use these middlewares                                
       - "traefik.http.services.traefik.loadbalancer.server.port=8080"
    command:
      - "--log.level=DEBUG"
      - "--api=true"
#      - "--api.insecure=true"
      - "--api.dashboard=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.swarmmode"
#      - "--providers.docker.network=default"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.websecure.http.tls.certResolver=le"
#      - "--entrypoints.ping.address=:8080"
#      - "--entrypoints.ping.http.tls.certResolver=le"
      - "--log=true"
      - "--log.filePath=/logs/traefik.log"
      - "--ping=true"
#      - "--ping.entryPoint=ping"
      - "--accesslog=true"
      - "--accesslog.filePath=/logs/access.log"
      - "[email protected]"
    ports:
      - "80:80"
      - "443:443"
#      - "8080:8080"
#      - "8082:8082"
    environment:
      TZ: 'Europe/Berlin'
    volumes:
      - "./letsencrypt:/letsencrypt"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "/etc/localtime:/etc/localtime:ro"
      - "$TRAEFIK_DATA_DIR:/logs"
    networks:
      - mystacknet

On failure the output of docker service ls shows 0/1 replicas. On success (using fake google url) it shows 1/1 and the dashboard is accessible. The logs of traefik do not show an error.

ID                  NAME                           MODE                REPLICAS            IMAGE                     PORTS                                                    
yghvyj2l7ri2        mystack-traefik_traefik        replicated          0/1                 traefik:v2.4.7            *:80->80/tcp, *:443->443/tcp

Upvotes: 0

Views: 405

Answers (1)

mrq
mrq

Reputation: 567

As a general rule, healthchecks have to be executed for a specific container (or task in swarm jargon) without involving other objects like load balancers, interlock, etc. and the best way to do this is referring directly to localhost/127.0.0.1.

Moreover, /ping endpoint is designed to expose a so-called liveness probe that tells the container is "UP" to swarm or other orchestrators in very early stages of container's life.

In this sense it is safer to get rid of every non necessary "step" involved, for instance avoiding TLS certificates where possible.

Putting all together you can probably have your healthcheck working like this:

  test: wget --quiet --tries=1 --spider http://traefik.mydomain.com:8080/ping || exit 1

(e.g. using http instead of https and http associated port, which is 8080). Far better is to refer to localhost

  test: wget --quiet --tries=1 --spider http://127.0.0.1:8080/ping || exit 1

and even better is using the dedicated traefik cli command which has been made available for this

  test: traefik healthcheck --ping

My suggestion is to use this last one.

I add that mostly, if not all, of this considerations around healthchecks stands using docker-compose instead of swarm.

Upvotes: 1

Related Questions