Reputation: 5201
Problem
I'm testing Docker Swarm and it won't deploy any stacks: after stack deploy
, the replicas remain at 0/1.
It's a two-node cluster of 1 manager and 1 worker. The manager is set to drain, so that the stack will deploy to the worker.
Here's what I'm doing:
docker-compose.yml:
---
services:
whoami:
image: traefik/whoami
Then deploying the stack with:
test-portal:/app/whoami$ docker stack deploy -c docker-compose.yml whoami
Since --detach=false was not specified, tasks will be created in the background.
In a future release, --detach=false will become the default.
Creating network whoami_default
Creating service whoami_whoami
There are no containers deployed on either machine. The following can be observed:
test-portal:/app/whoami$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
tsxrvw12zg6i whoami_whoami replicated 0/1 traefik/whoami:latest
test-portal:/app/whoami$ docker network ls
NETWORK ID NAME DRIVER SCOPE
149473faf294 bridge bridge local
76df1d9a4c91 docker_gwbridge bridge local
0939cda44322 host host local
n00f2g1whcn0 ingress overlay swarm
ee11daff62a5 none null local
uno0f18wnbbp whoami_default swarm
test-portal:/app/whoami$ docker network inspect uno
[
{
"Name": "whoami_default",
"Id": "uno0f18wnbbp14jv0000wnnzs",
"Created": "2025-03-02T14:38:34.136561408Z",
"Scope": "swarm",
"Driver": "",
"EnableIPv4": false,
"EnableIPv6": false,
"IPAM": {
"Driver": "",
"Options": null,
"Config": null
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": null,
"Options": null,
"Labels": {
"com.docker.stack.namespace": "whoami"
}
}
]
Potential clues:
stack deploy
. However, I discovered containers in the overlay network were being created in the default 10.0.0.x
subnet conflicts with my own network. I did swarm leave
on both machines and recreated the swarm with --default-pool-addr 10.1.99.0/24
instead, which is where I'm stuck now. Recreating the swarm with defaults oddly still left me stuck...stack deploy
anyways) only to get locked in this state (again) after I tried recreating the swarm with different settings.Update #1
More experimentation shows:
docker system prune --all
,and rebuild it with default settings (docker swarm init --advertise-addr 10.1.5.101
) as a single node, it deploys.docker system prune --all
,and rebuild it with a different subnet settings (docker swarm init --advertise-addr 10.1.5.101 -default-addr-pool 10.99.99.0/24
) as a single node, it fails to deploy.This seems to suggest network configuration/subnets on Docker Swarm are part of the problem.
Update #2
I'm having some success tearing down and recreating swarms with docker swarm init --default-addr-pool 10.100.0.0/16 --default-addr-pool-mask-length 24
from this post. At least, the service comes up and addressable.
I don't understand why the /16
mask and changing the default mask length works... was this because creating with a /24
mask was too restrictive/having some kind of interference?
Still, the containers created in these new networks (e.g. 10.100.1.3
for whoami_default
) seem unreachable from the manager or worker node itself. Do I still have a network misconfiguration? Shouldn't Docker be routing to this subnet?
Other environmental info:
test-portal:/app/whoami$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
phg2egnc6m4o7vpf47osurp4s * test-portal.private.network Ready Drain Leader 28.0.1
wer4delycdryonov1l1u3tp4l test-swarm-node-1.private.network Ready Active 28.0.
test-portal:/app/whoami$ docker info
Client: Docker Engine - Community
Version: 28.0.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.21.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.33.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 2
Server Version: 28.0.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: phg2egnc6m4o7vpf47osurp4s
Is Manager: true
ClusterID: yq0wosj1mkj28sbkv6vfoj42j
Managers: 1
Nodes: 2
Default Address Pool: 10.1.99.0/24
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.1.5.101
Manager Addresses:
10.1.5.101:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
runc version: v1.2.4-0-g6c52b3f
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.0-18-amd64
Operating System: Debian GNU/Linux 12 (bookworm)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.816GiB
Name: test-portal.private.network
ID: 9538b0fe-cc7d-4882-97cc-52c5f1d0bb92
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
test-portal:/app/whoami$ docker network inspect n0
[
{
"Name": "ingress",
"Id": "n00f2g1whcn0wzsts4hvgvyzo",
"Created": "2025-03-01T14:56:43.92160206-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv4": true,
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.1.99.0/24",
"Gateway": "10.1.99.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": true,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"ingress-sbox": {
"Name": "ingress-endpoint",
"EndpointID": "309e69924e66c572bbc690672cb8cd9ef466b8b4f3d7e8e9228d0ee7513f71ae",
"MacAddress": "02:42:0a:01:63:02",
"IPv4Address": "10.1.99.2/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
"Labels": {},
"Peers": [
{
"Name": "24b3935b696e",
"IP": "10.1.5.101"
},
{
"Name": "9ab40688269f",
"IP": "10.1.5.102"
}
]
}
]
Upvotes: 1
Views: 27