firelyu
firelyu

Reputation: 2232

The "--cluster-store" and "--cluster-advertise" don't work

I try to setup docker cluster with swarm and consul. I have manager, host1, and host2.
I run consul and swarm manager containers on the manager.

$ docker run --rm -p 8500:8500 progrium/consul -server -bootstrap
$ docker run -d -p 2377:2375 swarm manage consul://<manager>:8500

On host1 and host2, I modify the daemon options with --cluster-store and --cluster-advertise, and restart docker daemon.

host1
DOCKER_OPTS="--cluster-store=consul://<manager>:8500 --cluster-advertise=<host1>:2375"
host2
DOCKER_OPTS="--cluster-store=consul://<manager>:8500 --cluster-advertise=<host2>:2375"

When I join host1 and host2 to the swarm, it fails.

host1 $ docker run --rm swarm join --advertise=<host1>:2375 consul://<manager>:8500
host2 $ docker run --rm swarm join --advertise=<host2>:2375 consul://<manager>:8500

From the swarm manager log, it error out.

time="2016-01-20T02:17:17Z" level=error msg="Get http://<host1>:2375/v1.15/info: dial tcp <host1>:2375: getsockopt: connection refused"
time="2016-01-20T02:17:20Z" level=error msg="Get http://<host2>:2375/v1.15/info: dial tcp <host2>:2375: getsockopt: connection refused"

Upvotes: 7

Views: 6148

Answers (3)

vanquangthanhhao
vanquangthanhhao

Reputation: 1

Please remove "docker.pid" and "docker.sock" at /var/run. Next, restart your host-machine and restart service docker by "sudo service docker restart"

Good luck to you !!

Upvotes: 0

Jan
Jan

Reputation: 41

Since i've come about a similar problem aswell i did eventually find out why it didn't work (in my example I'm using multiple boxes on a LAN 192.168.10.0/24 that I want to manage from in there and only allow access from the outside to certain containers -- the following examples are run on the box at 192.168.10.1):

  • set up the Daemons with --cluster-store consul://192.168.10.1:8500 and port 8500 (deploying Consul & registrator on each Daemon as the first containers) and --cluster-advertise 192.168.10.1:2375 aswell as -H tcp://192.168.10.1:2375 -H unix:///var/run/docker.sock -H tcp://127.0.0.1:2375 (i do not however bind to the other available addresses as you would with tcp://0.0.0.0:2375 and instead only bind to the local 192.168.10.0/24). In case you want containers only binding to the local network aswell (as i did in this case) you can specify the additional --ip parameter for the Daemon - when containers should be available to everywhere else aswell (in my case only an nginx load balancer with failover via keepalived) you specify binding the port to all interfaces docker run ... -p 0.0.0.0:host_port:container_port ... <image>
  • Start the Daemons
  • Deploy gliderlabs/registrator and Consul with compose (this is an example from the first box in my setup but I start the equivalent on all Daemons for a complete Consul HA failover setup) docker-compose -p bootstrap up -d (naming the containers bootstrap_registrator_1 and bootstrap_consul_1 in the private network bootstrap):

    version: '2'
    services:
      registrator:
        image: gliderlabs/registrator
        command: consul://192.168.10.1:8500
        depends_on:
          - consul
        volumes:
          - /var/run/docker.sock:/tmp/docker.sock
        restart: unless-stopped
    
      consul:
        image: consul
        command: agent -server -bootstrap -ui -advertise 192.168.10.1 -client 0.0.0.0
        hostname: srv-0
        network_mode: host
        ports:
          - "8300:8300"     # Server RPC, Server Use Only
          - "8301:8301/tcp" # Serf Gossip Protocol for LAN
          - "8301:8301/udp" # Serf Gossip Protocol for LAN
          - "8302:8302/tcp" # Serf Gossip Protocol for WAN, Server Use Only
          - "8302:8302/udp" # Serf Gossip Protocol for WAN, Server Use Only
          - "8400:8400"     # CLI RPC
          - "8500:8500"     # HTTP API & Web UI
          - "53:8600/tcp"   # DNS Interface
          - "53:8600/udp"   # DNS Interface
        restart: unless-stopped
    
  • now the Daemons register and set locks on the KV-store (Consul) in docker/nodes and Swarm does not automatically seem to read from this location.. So when it tries to read which Daemons are available it doesn't find any. Now this bit cost me the most time: To solve this I had to specify --discovery-opt kv.path=docker/nodes and start Swarm with docker-compose -p bootstrap up -d - on all boxes aswell to end up with a Swarm HA failover of managers:

    version: '2'
    services:
      swarm-manager:
        image: swarm
        command: manage -H :3375 --replication --advertise 192.168.10.1:3375 --discovery-opt kv.path=docker/nodes consul://192.168.10.1:8500
        hostname: srv-0
        ports:
          - "192.168.10.1:3375:3375" #
        restart: unless-stopped
    
  • Now I end up with a working Swarm that is only available on the 192.168.10.0/24 network on port 3375. All containers that are started are only available to this network aswell unless i specify -p 0.0.0.0:host_port:container_port when starting (with docker run)

  • Further scaling: When I add more boxes to the local network to grow the capacity my idea would be to add more Daemons and maybe non-manager Swarm instances with those aswell as later Consul clients (rather than servers, started with -server).

Upvotes: 4

Auzias
Auzias

Reputation: 3798

Are you running consul for the multihosts-networking discovery or for the Swarm agents discovery?

Did you try to check the consul members ? Why don't you run docker daemon to connect locally to consul and then consul join the consul members ? Is there any reason for not doing so?

I also suggest the static file methods for Swarm agents discovery. Fastest, easiest and safest mean I know!

You should take a look at: how to create docker overlay network between multi hosts? it may help you.

Upvotes: 0

Related Questions