Robbie Milejczak
Robbie Milejczak

Reputation: 5770

Determine why docker image fails to run in swarm mode but works via compose with the same yml file

I have the following docker-compose.yml:

version: '3.7'
services:
  gateway:
    image: rmilejcz/kalos-gateway:latest
    deploy:
      replicas: 1
    ports:
      - '443:443'
    networks:
      - rpcnet
  rpc:
    image: rmilejcz/kalos-rpc:latest
    deploy:
      replicas: 1
    ports:
      - '8418:8418'
    networks:
      - rpcnet
  proxy:
    image: rmilejcz/grpcwebproxy:latest
    deploy:
      replicas: 1
    ports:
      - '8080:8080'
    networks:
      - rpcnet
networks:
  rpcnet:

It is essentially an rpc server with two separate reverse proxies, gateway translates normal HTTP requests and forwards them to rpc and proxy translates gRPC-web requests and forwards them to rpc.

When I run this via docker-compose up it works as expected and this is easily confirmed by running:

curl localhost:443/v1/lookup/vendor

However when I try to run this in a swarm:

docker swarm init
docker deploy --compose-file docker-compose.yml test
# OR
docker stack deploy --compose-file docker-compose.yml test

The previously working curl example returns:

all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup rpc on 127.0.0.11:53: no such host\"

meaning that the rpc service is not available. Not sure where 127.0.0.11:53 comes from, I'm guessing 127.0.0.11 is what rpc resolves to but I'm not sure what :53 is derived from.

docker service ls test_rpc shows REPLICAS at 0/1. I'm almost certain that for whatever reason, the rpc service fails to bind to rpc:8418 because if I change that to localhost:8418 and run docker service ls test_rpc I can see that REPLICAS is at 1/1, however I am still unable to communicate with that service via either proxy due to the same error above (all subconns in transient failure)

Am I making a bad assumption about container communication within a docker swarm? Is there any way for me to get detailed error information from the rpc service to determine exactly why it is failing? If I run docker-compose up I can see the services stdout in my terminal, is there some similar capability for docker swarm?

Upvotes: 0

Views: 742

Answers (2)

BMitch
BMitch

Reputation: 263617

docker service ls test_rpc shows REPLICAS at 0/1. I'm almost certain that for whatever reason, the rpc service fails to bind to rpc:8418 because if I change that to localhost:8418 and run docker service ls test_rpc I can see that REPLICAS is at 1/1

This sounds like you are talking about the application binding to the port. It should not be binding to rcp:8418 because, by default in Swarm mode, that will resolve to a VIP that gets routed to each healthy container. Instead, configure your application to bind to 0.0.0.0:8418. That indicates it should listen locally on all interfaces attached to the container for incoming requests.

Upvotes: 1

Haider Jafri
Haider Jafri

Reputation: 66

127.0.0.11:53 is the address of DNS server. The rpc service is somehow crashing/ not starting, due to which 'gateway' service is unable to forward requests to host rpc as there is no such service running on the network and the DNS lookup by gateway for rpc is returning "no such host".

Upvotes: 1

Related Questions