Docker, Kafka - replication doesn't work between remote brokers

Question

Have docker images of kafka brokers and zookeeper - call them z1, b1, b2 for now. They are deployed on two physical servers s1 and s2 as so:
s1 contains z1 and b1
s2 contains b2

In their own docker-compose.yml files, zookeeper has set ports as following:

- 2181:2181
- 2888:2888
- 3888:3888

and brokers as following:

- 9092:9092

Topic with --replication-factor 2 and --partitions 4 can be created.
No data are pushed to topic for whole time, but still following problem occurs.
If kafka-topics --describe --topic --zookeeper is run shortly after topic creation, all is insync and looks good.
On second run (with short delay), b1 removes b2 partitions replicas from it's insync, but b2 doesn't remove b1 partitions replicas from insync.

In server.log from b1, there are showing many of these exceptions:

WARN [ReplicaFetcherThread-0-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@42746de3 (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to ef447651b07a:9092 (id: 1 rack: null) failed
    at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:83)
    at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:93)
    at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:248)
    at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
    at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Swapping leadership works between brokers b1 and b2, as they are shut down and started again, but then only the last one online is in full control of topic - is leader for all partitions and only one insync, even if the other broker comes back online.

Tried cleaning all data, reseting both brokers and zookeeper, but problem persists.

Why partitions aren't properly replicated ?

milkamar · Accepted Answer

I figured it out. There was problem with network, as Michael G. Noll said.
Firstly, I don't map ports manually anymore and use host network instead. It's easier to manage.
Secenodary, b1 and b2 had listeners set like so:

listeners=PLAINTEXT://:9092

They both had no ip specified, so 0.0.0.0 was used by default and there was collission, as they both listened there and pushed same connection information to zookeeper.

Final configuration:
b1 and b2 docker-compose.yml use host network:

network_mode: "host"

b1 server.properties` - listeners:

listeners=PLAINTEXT://:9092

b2 server.properties` - listeners:

listeners=PLAINTEXT://:9092

Everything works fine now, replication is working, even on broker restarts. Data can be produced and consumed correctly.

Docker, Kafka - replication doesn't work between remote brokers

Answers (2)

Related Questions

Docker, Kafka - replication doesn&#39;t work between remote brokers

Answers (2)

Related Questions

Docker, Kafka - replication doesn't work between remote brokers