Alex Cohen
Alex Cohen

Reputation: 6206

How to enable faster container rescheduling with Docker Swarm and Consul?

For some background on my environment:

I have docker swarm running on 3 ubuntu 14.04 vagrant boxes. The swarm master is running on 1 machine (with consul) and the other 2 machines are running swarm workers that are joined to the master. I set up the environment following the documentation page https://docs.docker.com/swarm/install-manual/. It is working correctly so that any docker -H :4000 <some_docker_command> run from my master machine works fine. Service discovery is active as I am running the gliderlabs/registrator container on both of my workers.

The issue:

Any changes to my cluster, such as a node or container failure and the process of rescheduling containers (which are created with the tag -e "reschedule:on-node-failure") by swarm occur within about 30 - 45 seconds. By comparison when I was running fleet and etcd on CoreOS systems container rescheduling and notification of node failures occurred usually within about 5 seconds. Is there any way to change some of the settings within consul and docker swarm to speed everything up to a level similar to what I experienced with fleet and etcd on CoreOS? If so what would I need to do?

tldr: I am running swarm with consul, container reschedualing and changes to the output ofdocker -H :4000 ps don't occur untill about 30 - 45 seconds after a node goes down. How can I reduce this time period?

Upvotes: 2

Views: 301

Answers (1)

Peter Svensson
Peter Svensson

Reputation: 6173

You could try to set the TTL and retries to lower values to get the swarm manager to act faster on failures.

For example:

docker run swarm manage --engine-failure-retry=1 consul:x.y.z.a:8500

Full documentation

Upvotes: 0

Related Questions