ygk
ygk

Reputation: 600

Docker-swarm overlay network is not working for containers in different hosts

We have a networking problem in docker-swarm. The problem is below;

Where should I check, any advices?

    server-1:~$ docker version
    Client:
     Version:      17.03.0-ce
     API version:  1.26
     Go version:   go1.7.5
     Git commit:   3a232c8
     Built:        Tue Feb 28 08:01:32 2017
     OS/Arch:      linux/amd64

    Server:
     Version:      17.03.0-ce
     API version:  1.26 (minimum version 1.12)
     Go version:   go1.7.5
     Git commit:   3a232c8
     Built:        Tue Feb 28 08:01:32 2017
     OS/Arch:      linux/amd64
     Experimental: true

ps: I checked this post but I have latest version of docker / docker-swarm so the issue should be fixed..

ps-2: similar problem; https://github.com/docker/swarm/issues/2687

Upvotes: 11

Views: 9418

Answers (6)

Brett Bim
Brett Bim

Reputation: 3318

Tangentially related but this post shows up in searches for similar problems. If you arrive here trying to setup Docker swarms in AWS EC2 and have similar issues to the OP, you may need a specific Security Group rule for IPsec Protocol 50. This is applicable if you are using encrypted overlay networks.

enter image description here

Upvotes: 1

cn137
cn137

Reputation: 31

Resolution to the issue as mentioned above.
Use the following when you initializing the swarm

docker swarm init --advertise-addr=YOURIP --listen-addr=0.0.0.0 --data-path-port=7779 --force-new-cluster=true

Resources:

Docker:

VMWare:

Thanks @Izkuru

Upvotes: 2

AtomPi
AtomPi

Reputation: 354

"VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application."

But we can change docker swarm data-path-port(the default port number 4789 is used) to another:

docker swarm init --data-path-port=7789

Upvotes: 19

Izkuru
Izkuru

Reputation: 71

Out of curiosity, in your VMware environment, do you have NSX deployed? I may have an answer, but it only applies if NSX is deployed in the environment.

ESXi will apparently drop OUTBOUND packets from VMs if the destination port is the same as the port configured for the VXLAN VTEP communication.

NSX utilizes port 4789/udp for VTEP communication for VXLAN (by default, as of 6.2.3; prior to that, it was 8472/udp). (If the VMs are on the same host, then traffic is not dropped, because, while it may be OUTBOUND traffic, it does not egress the host, and does not get to the same stage within the VMKernel to be dropped.)

The wording in KB2079386 is a little off. It states:

VXLAN port 8472 is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.

But, it should read:

VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.

If you are using NSX, you could try changing the port used for the VXLAN VTEPs, but port 4789/udp is required if you are going to leverage hardware VTEPs at all.

(I can't take full credit for this. I stumbled across this blog post talking about similar behavior when troubleshooting a similar issue.)

Upvotes: 7

LyphTEC
LyphTEC

Reputation: 1029

If your nodes are not on the same subnet (eg. they all have public IPs) - then make sure you use the --advertise-addr option specifying the IP address that the other nodes can reach when that node (other managers AND workers) joins the swarm.

Otherwise the overlay network will not route correctly between hosts even though stack deployment & node registration etc appear to be working fine.

See the detailed explanation for my case in the same GitHub issue --> https://github.com/docker/swarm/issues/2687

Upvotes: 1

BMitch
BMitch

Reputation: 264831

The first thing I would check for overlay networking is your firewall rules. You need the following open between the hosts:

  • The swarm port, usually 2377/tcp, this is most likely already done
  • The overlay control port 7946/tcp and 7946/udp
  • The overlay data port 4789/udp
  • The IPSEC protocol 50 if your overlay networks are defined as "secure" (that's a protocol, not a port, so iptables -A INPUT -p 50 -j ACCEPT)

If that doesn't help, look into using netshoot to debug where the traffic is getting stopped.

Upvotes: 4

Related Questions