user4948798
user4948798

Reputation: 2100

Docker Swarm: Service keep on getting Ready & Shutdown

I have couple of docker swarm nodes, When tried to create the service on Leader with below command. Service creation process still going on it is more-than 40 minutes now.

docker service create \
 --mode global \
 --mount type=bind,src=/project/m32/,dst=/root/m32/ \
 --publish mode=host,target=310,published=310 \
 --publish mode=host,target=311,published=311 \
 --publish mode=host,target=312,published=312 \
 --publish mode=host,target=313,published=313 \
 --constraint "node.labels.m32 == true" \
 --name m32 \
 local-registry/ubuntu:07 
overall progress: 1 out of 2 tasks
ew0edluvz39p: ready     [======================================>            ]
kzc7jf7irsrh: running   [==================================================>]

From service process, it keep on showing as Ready and Shutdown

$ docker service ps m32
ID             NAME                                IMAGE                                        NODE      DESIRED STATE      CURRENT STATE   ERROR     PORTS
s4q0rqrqbpdn   m32.ew0edluvz39pazold0wnv2ean       local-registry/ubuntu:07   sl-089   Ready            Ready 1 second ago                
r6vibgptm5oc    \_ m32.ew0edluvz39pazold0wnv2ean   local-registry/ubuntu:07   sl-089   Shutdown         Complete 1 second ago             
joq2p6c9jpnx    \_ m32.ew0edluvz39pazold0wnv2ean   local-registry/ubuntu:07   sl-089   Shutdown         Complete 7 seconds ago            
a5h8gac02vfx    \_ m32.ew0edluvz39pazold0wnv2ean   local-registry/ubuntu:07   sl-089   Shutdown         Complete 13 seconds ago           
f51stfsdlhvp    \_ m32.ew0edluvz39pazold0wnv2ean   local-registry/ubuntu:07   sl-089   Shutdown         Complete 19 seconds ago           
zqcbxkm4fwhr   m32.kzc7jf7irsrhnx3kurcwqjb2j       local-registry/ubuntu:07   sl-090   Ready            Ready less than a second ago      
za8efvi9x4yw    \_ m32.kzc7jf7irsrhnx3kurcwqjb2j   local-registry/ubuntu:07   sl-090   Shutdown         Complete less than a second ago  
$ sudo systemctl status docker.service

Nov 24 19:58:48 svr2 dockerd[2797]: time="2021-11-24T19:58:48.200421563+05:30" level=info msg="ignoring event" container=ea8b76fedb18159ba0cd8f279a9ca4264399c>
Nov 24 20:01:39 svr2 dockerd[2797]: time="2021-11-24T20:01:39.602028420+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:06:39 svr2 dockerd[2797]: time="2021-11-24T20:06:39.802013427+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:11:40 svr2 dockerd[2797]: time="2021-11-24T20:11:40.001992437+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:14:17 svr2 dockerd[2797]: time="2021-11-24T20:14:17.871605342+05:30" level=error msg="Error getting service xkauq9a599iv: service xkauq9a599iv not f>
Nov 24 20:14:52 svr2 dockerd[2797]: time="2021-11-24T20:14:52.833890158+05:30" level=error msg="Error getting service xkauq9a599iv: service xkauq9a599iv not f>
Nov 24 20:15:12 svr2 dockerd[2797]: time="2021-11-24T20:15:12.395692837+05:30" level=error msg="Error getting service pwaa8cvdd683: service pwaa8cvdd683 not f>
Nov 24 20:15:17 svr2 dockerd[2797]: time="2021-11-24T20:15:17.773200054+05:30" level=error msg="Error getting service xk0v0g2roypx: service xk0v0g2roypx not f>
Nov 24 20:16:18 svr2 dockerd[2797]: time="2021-11-24T20:16:18.529344060+05:30" level=error msg="Error getting service xk0v0g2roypx: service xk0v0g2roypx not f>
Nov 24 20:16:40 svr2 dockerd[2797]: time="2021-11-24T20:16:40.201888504+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>

It looks loop process keep on creating containers. What is wrong in my way? Any help to fix this problem will be highly appreciated. Thanks

Upvotes: 0

Views: 1866

Answers (1)

Chris Becke
Chris Becke

Reputation: 36141

You really need to pass --restart-max-attempts 5 to your docker service create to ensure that services don't start too many times in a loop. Its bad for the stability of docker, and hard to debug. Rather have a task just give up and stop so you can see something is wrong and diagnose it.

To see specifically what is wrong you would want to look at the logs of each task. You use the individual task id's to see why each one failed:

# The logs for a task
docker service logs s4q0rqrqbpdn
# A general breakdown of a task
docker inspect s4q0rqrqbpdn

Sometimes you need to track down the actual container for the task and inspect that. docker container is not swarm aware, so

# list the service showing the full task id.
docker service ps <service> --no-trunc

# then docker context use <node> / ssh <node> to switch to a node of interest.

# Then, the container name is the "ID"."NAME" from the PS list. For example:
docker context use sl-089
docker container inspect m32.ew0edluvz39pazold0wnv2ean.s4q0rqrqbpdnABCDEFGABCDEFG

Inspecting the container can show if it was killed because of an OOM or certain other reasons that don't otherwise show up.

Upvotes: 2

Related Questions