anghel adrian
anghel adrian

Reputation: 182

Docker swarm stop grace period doesn't work as expected

I am running Docker in swarm mode with several nodes in the cluster. According to the documentation written here: https://docs.docker.com/engine/reference/commandline/service_update/ and here: https://docs.docker.com/engine/reference/commandline/service_create/, --stop-grace-period command sets the time to wait before force killing a container.

Expected behavior - My expectation was that Docker would wait this period of time until it tries to stop a running container, during a rolling update.

Actual behavior - Docker sends the termination signal after several seconds the new container with the new version of the image starts.

Steps to reproduce the behavior

  1. docker service create --replicas 1 --stop-grace-period 60s --update-delay 60s --update-monitor 5s --update-order start-first --name nginx nginx:1.15.8
  2. Wait for the service to start up the container (aprox. 2 minutes)
  3. docker service update --image nginx:1.15.9 nginx
  4. docker ps -a enter image description here
  5. As you can see, the new container started and after a second, the old one was killed by Docker.

Any idea why?

I also opened an issue on Github, here: https://github.com/docker/for-linux/issues/615

Upvotes: 2

Views: 6389

Answers (2)

ozlevka
ozlevka

Reputation: 2156

I think you can close the issue on GitHub.

stop-grace-period this is the period between stop (SIGTERM) and kill (SIGKILL).

Of course, you can change SIGTERM to another signal by using --stop-signal switch. The behavior of application into a container, when a stop signal is received, is your responsibility.

Here good article explaining this kitchen.

Upvotes: 1

programmerq
programmerq

Reputation: 6554

The --stop-grace-period value is the amount of time that Docker will wait after sending a sigterm and give up waiting for the container to exit gracefully. Once the grace period is complete, it will kill the container with a sigkill.

The sequence of events seem to happen as is designed based on your description of your setup. Your container exits cleanly and quickly when it gets its sigterm so Docker never needs to send a sigkill.

I see you also specified --update-delay 60 but that won't take effect since you only have one replica. The update delay will tell docker to wait 60 seconds after cycling the first task, so it is only helpful for 2 or more replicas.

It seems like you want your single-replica service to run a new task and an old task concurrently for 60 seconds, but swarm mode is happy to get rid of old containers with sigterm as soon as the new container is up.

Upvotes: 1

Related Questions