Reputation: 3186
I have been tasked to build a production ready Swarm cluster using Zookeeper as dicovery backend. I used the official documentation for this purpose, https://docs.docker.com/swarm/install-manual/. Concerning backend discovery I used this one: https://docs.docker.com/swarm/discovery/. Now I have an issue. When I try to communicate with the swarm, I have this error: No elected primary cluster manager.
This is my setup:
I'm running on Ubuntu 16.04 with docker Client/Server version 1.12.3, with zookeeper 3.4.9 launch in the same host as my swarm manager. I'm using a two nodes architecture with one swarm manager and one swarm worker
After Docker Engine installation on each node,
$ nohup docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock &
Now on the swarm manager:
$ docker run -d -p 4000:4000 swarm manage -H :4000 --replication --advertise <swarm-manager-ip>:4000 zk://<swarm-manager-ip>/swarm
On the swarm worker:
$ docker run -d swarm join --advertise=<swarm-worker-ip>:2375 zk://<swarm-manager-ip>/swarm
Now when I try to see if everything is good, I hit the command below and the result follows.
$ docker -H <swarm-manager-ip>:4000 ps -a
Error response from daemon: No elected primary cluster manager
When I just do this:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
91c3864ba6ee swarm "/swarm manage -H :40" 17 hours ago Up 19 minutes 2375/tcp, 0.0.0.0:4000->4000/tcp swarm-master
I can see the swarm master and when I try to see the logs of the swarm node, I can see this:
$ docker logs 91c3864ba6ee
time="2016-12-09T20:29:39Z" level=info msg="Initializing discovery without TLS"
time="2016-12-09T20:29:39Z" level=info msg="Listening for HTTP" addr=":4000" proto=tcp
time="2016-12-09T20:29:39Z" level=info msg="Leader Election: Cluster leadership lost"
2016/12/09 20:29:40 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:40Z" level=error msg="zk: could not connect to a server"
time="2016-12-09T20:29:40Z" level=error msg="zk: could not connect to a server"
time="2016-12-09T20:29:40Z" level=error msg="Discovery error: zk: could not connect to a server"
2016/12/09 20:29:42 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:42Z" level=error msg="Discovery error: zk: could not connect to a server"
2016/12/09 20:29:44 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:44Z" level=error msg="Discovery error: zk: could not connect to a server"
time="2016-12-09T20:29:44Z" level=error msg="Discovery error: Unexpected watch error"
2016/12/09 20:29:46 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
2016/12/09 20:29:48 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:50Z" level=info msg="Leader Election: Cluster leadership lost"
2016/12/09 20:29:50 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:50Z" level=error msg="zk: could not connect to a server"
time="2016-12-09T20:29:50Z" level=error msg="zk: could not connect to a server"
But a simple telnet command shows me that my zookeeper host is working. So how do I have a i/o timeout when the swarm try to connect to zookeeper discovery backend?
Upvotes: 1
Views: 973
Reputation: 3439
As mentioned in the comments there is a new version called Swarm mode
embedded with Docker since 1.12
. It includes a built-in high-available distributed object store so you don't have to setup an external KV store yourself.
Now regarding your issue with the first version of Swarm, one line caught my attention:
2016/12/09 20:29:50 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
To me it seems that zookeeper is not running on your machine or that you didn't point to the right port.
First check that zookeeper is running on your machine with:
ps aux | grep zookeeper
You should see a process running.
If not, make sure you create a zoo.cfg
file in the conf
directory of your zookeeper installation specifying the right port, for example:
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
You can look at This Tutorial to bootstrap zookeeper.
After this you can run the zkStart.sh
script to start your zookeeper instance and swarm should now be able to properly connect and register the Leader
key.
If this still does not work, try downgrading to zookeeper 3.4.6
as this is the last known supported version since the switch to Docker Swarm Mode
.
Upvotes: 1