Maximilien Belinga
Maximilien Belinga

Reputation: 3186

Docker Swarm with Zookeeper - No elected primary cluster manager

I have been tasked to build a production ready Swarm cluster using Zookeeper as dicovery backend. I used the official documentation for this purpose, https://docs.docker.com/swarm/install-manual/. Concerning backend discovery I used this one: https://docs.docker.com/swarm/discovery/. Now I have an issue. When I try to communicate with the swarm, I have this error: No elected primary cluster manager.

This is my setup:

I'm running on Ubuntu 16.04 with docker Client/Server version 1.12.3, with zookeeper 3.4.9 launch in the same host as my swarm manager. I'm using a two nodes architecture with one swarm manager and one swarm worker

After Docker Engine installation on each node,

$ nohup docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock &

Now on the swarm manager:

$ docker run -d -p 4000:4000 swarm manage -H :4000 --replication --advertise <swarm-manager-ip>:4000 zk://<swarm-manager-ip>/swarm

On the swarm worker:

$ docker run -d swarm join --advertise=<swarm-worker-ip>:2375 zk://<swarm-manager-ip>/swarm

Now when I try to see if everything is good, I hit the command below and the result follows.

$ docker -H <swarm-manager-ip>:4000 ps -a
Error response from daemon: No elected primary cluster manager

When I just do this:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                              NAMES
91c3864ba6ee        swarm               "/swarm manage -H :40"   17 hours ago        Up 19 minutes       2375/tcp, 0.0.0.0:4000->4000/tcp   swarm-master

I can see the swarm master and when I try to see the logs of the swarm node, I can see this:

$ docker logs 91c3864ba6ee
time="2016-12-09T20:29:39Z" level=info msg="Initializing discovery without TLS" 
time="2016-12-09T20:29:39Z" level=info msg="Listening for HTTP" addr=":4000" proto=tcp 
time="2016-12-09T20:29:39Z" level=info msg="Leader Election: Cluster leadership lost" 
2016/12/09 20:29:40 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:40Z" level=error msg="zk: could not connect to a server" 
time="2016-12-09T20:29:40Z" level=error msg="zk: could not connect to a server" 
time="2016-12-09T20:29:40Z" level=error msg="Discovery error: zk: could not connect to a server" 
2016/12/09 20:29:42 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:42Z" level=error msg="Discovery error: zk: could not connect to a server" 
2016/12/09 20:29:44 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:44Z" level=error msg="Discovery error: zk: could not connect to a server" 
time="2016-12-09T20:29:44Z" level=error msg="Discovery error: Unexpected watch error" 
2016/12/09 20:29:46 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
2016/12/09 20:29:48 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:50Z" level=info msg="Leader Election: Cluster leadership lost" 
2016/12/09 20:29:50 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:50Z" level=error msg="zk: could not connect to a server" 
time="2016-12-09T20:29:50Z" level=error msg="zk: could not connect to a server" 

But a simple telnet command shows me that my zookeeper host is working. So how do I have a i/o timeout when the swarm try to connect to zookeeper discovery backend?

Upvotes: 1

Views: 973

Answers (1)

abronan
abronan

Reputation: 3439

As mentioned in the comments there is a new version called Swarm mode embedded with Docker since 1.12. It includes a built-in high-available distributed object store so you don't have to setup an external KV store yourself.

Now regarding your issue with the first version of Swarm, one line caught my attention:

2016/12/09 20:29:50 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout

To me it seems that zookeeper is not running on your machine or that you didn't point to the right port.

First check that zookeeper is running on your machine with:

ps aux | grep zookeeper

You should see a process running.

If not, make sure you create a zoo.cfg file in the conf directory of your zookeeper installation specifying the right port, for example:

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181

You can look at This Tutorial to bootstrap zookeeper.

After this you can run the zkStart.sh script to start your zookeeper instance and swarm should now be able to properly connect and register the Leader key.

If this still does not work, try downgrading to zookeeper 3.4.6 as this is the last known supported version since the switch to Docker Swarm Mode.

Upvotes: 1

Related Questions