Reputation: 2232
I try to setup docker cluster with swarm
and consul
. I have manager
, host1
, and host2
.
I run consul
and swarm manager
containers on the manager.
$ docker run --rm -p 8500:8500 progrium/consul -server -bootstrap
$ docker run -d -p 2377:2375 swarm manage consul://<manager>:8500
On host1 and host2, I modify the daemon options with --cluster-store
and --cluster-advertise
, and restart docker daemon
.
host1
DOCKER_OPTS="--cluster-store=consul://<manager>:8500 --cluster-advertise=<host1>:2375"
host2
DOCKER_OPTS="--cluster-store=consul://<manager>:8500 --cluster-advertise=<host2>:2375"
When I join host1 and host2 to the swarm, it fails.
host1 $ docker run --rm swarm join --advertise=<host1>:2375 consul://<manager>:8500
host2 $ docker run --rm swarm join --advertise=<host2>:2375 consul://<manager>:8500
From the swarm manager log, it error out.
time="2016-01-20T02:17:17Z" level=error msg="Get http://<host1>:2375/v1.15/info: dial tcp <host1>:2375: getsockopt: connection refused"
time="2016-01-20T02:17:20Z" level=error msg="Get http://<host2>:2375/v1.15/info: dial tcp <host2>:2375: getsockopt: connection refused"
Upvotes: 7
Views: 6148
Reputation: 1
Please remove "docker.pid" and "docker.sock" at /var/run. Next, restart your host-machine and restart service docker by "sudo service docker restart"
Good luck to you !!
Upvotes: 0
Reputation: 41
Since i've come about a similar problem aswell i did eventually find out why it didn't work (in my example I'm using multiple boxes on a LAN 192.168.10.0/24
that I want to manage from in there and only allow access from the outside to certain containers -- the following examples are run on the box at 192.168.10.1
):
--cluster-store consul://192.168.10.1:8500
and port 8500 (deploying Consul & registrator on each Daemon as the first containers) and --cluster-advertise 192.168.10.1:2375
aswell as -H tcp://192.168.10.1:2375 -H unix:///var/run/docker.sock -H tcp://127.0.0.1:2375
(i do not however bind to the other available addresses as you would with tcp://0.0.0.0:2375
and instead only bind to the local 192.168.10.0/24). In case you want containers only binding to the local network aswell (as i did in this case) you can specify the additional --ip
parameter for the Daemon - when containers should be available to everywhere else aswell (in my case only an nginx load balancer with failover via keepalived) you specify binding the port to all interfaces docker run ... -p 0.0.0.0:host_port:container_port ... <image>
Deploy gliderlabs/registrator and Consul with compose (this is an example from the first box in my setup but I start the equivalent on all Daemons for a complete Consul HA failover setup) docker-compose -p bootstrap up -d
(naming the containers bootstrap_registrator_1
and bootstrap_consul_1
in the private network bootstrap
):
version: '2'
services:
registrator:
image: gliderlabs/registrator
command: consul://192.168.10.1:8500
depends_on:
- consul
volumes:
- /var/run/docker.sock:/tmp/docker.sock
restart: unless-stopped
consul:
image: consul
command: agent -server -bootstrap -ui -advertise 192.168.10.1 -client 0.0.0.0
hostname: srv-0
network_mode: host
ports:
- "8300:8300" # Server RPC, Server Use Only
- "8301:8301/tcp" # Serf Gossip Protocol for LAN
- "8301:8301/udp" # Serf Gossip Protocol for LAN
- "8302:8302/tcp" # Serf Gossip Protocol for WAN, Server Use Only
- "8302:8302/udp" # Serf Gossip Protocol for WAN, Server Use Only
- "8400:8400" # CLI RPC
- "8500:8500" # HTTP API & Web UI
- "53:8600/tcp" # DNS Interface
- "53:8600/udp" # DNS Interface
restart: unless-stopped
now the Daemons register and set locks on the KV-store (Consul) in docker/nodes
and Swarm does not automatically seem to read from this location.. So when it tries to read which Daemons are available it doesn't find any. Now this bit cost me the most time:
To solve this I had to specify --discovery-opt kv.path=docker/nodes
and start Swarm with docker-compose -p bootstrap up -d
- on all boxes aswell to end up with a Swarm HA failover of managers:
version: '2'
services:
swarm-manager:
image: swarm
command: manage -H :3375 --replication --advertise 192.168.10.1:3375 --discovery-opt kv.path=docker/nodes consul://192.168.10.1:8500
hostname: srv-0
ports:
- "192.168.10.1:3375:3375" #
restart: unless-stopped
Now I end up with a working Swarm that is only available on the 192.168.10.0/24
network on port 3375. All containers that are started are only available to this network aswell unless i specify -p 0.0.0.0:host_port:container_port
when starting (with docker run
)
-server
).Upvotes: 4
Reputation: 3798
Are you running consul for the multihosts-networking discovery or for the Swarm agents discovery?
Did you try to check the consul members
?
Why don't you run docker daemon
to connect locally to consul
and then consul join
the consul members ? Is there any reason for not doing so?
I also suggest the static file methods for Swarm agents discovery. Fastest, easiest and safest mean I know!
You should take a look at: how to create docker overlay network between multi hosts? it may help you.
Upvotes: 0