milkamar
milkamar

Reputation: 453

Docker, Kafka - replication doesn't work between remote brokers

Have docker images of kafka brokers and zookeeper - call them z1, b1, b2 for now. They are deployed on two physical servers s1 and s2 as so:
s1 contains z1 and b1
s2 contains b2

In their own docker-compose.yml files, zookeeper has set ports as following:

- 2181:2181
- 2888:2888
- 3888:3888

and brokers as following:

- 9092:9092

Topic with --replication-factor 2 and --partitions 4 can be created.
No data are pushed to topic for whole time, but still following problem occurs.
If kafka-topics --describe --topic <name_of_topic> --zookeeper <zookeeperIP:port> is run shortly after topic creation, all is insync and looks good.
On second run (with short delay), b1 removes b2 partitions replicas from it's insync, but b2 doesn't remove b1 partitions replicas from insync.

In server.log from b1, there are showing many of these exceptions:

WARN [ReplicaFetcherThread-0-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@42746de3 (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to ef447651b07a:9092 (id: 1 rack: null) failed
    at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:83)
    at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:93)
    at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:248)
    at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
    at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Swapping leadership works between brokers b1 and b2, as they are shut down and started again, but then only the last one online is in full control of topic - is leader for all partitions and only one insync, even if the other broker comes back online.

Tried cleaning all data, reseting both brokers and zookeeper, but problem persists.

Why partitions aren't properly replicated ?

Upvotes: 0

Views: 686

Answers (2)

milkamar
milkamar

Reputation: 453

I figured it out. There was problem with network, as Michael G. Noll said.
Firstly, I don't map ports manually anymore and use host network instead. It's easier to manage.
Secenodary, b1 and b2 had listeners set like so:

listeners=PLAINTEXT://:9092

They both had no ip specified, so 0.0.0.0 was used by default and there was collission, as they both listened there and pushed same connection information to zookeeper.

Final configuration:
b1 and b2 docker-compose.yml use host network:

network_mode: "host"

b1 server.properties` - listeners:

listeners=PLAINTEXT://<s1_IP>:9092

b2 server.properties` - listeners:

listeners=PLAINTEXT://<s2_IP>:9092

Everything works fine now, replication is working, even on broker restarts. Data can be produced and consumed correctly.

Upvotes: 0

miguno
miguno

Reputation: 15087

It looks like the brokers b1 and b2 can't talk to each other, which indicates a Docker-related networking issue (and such Docker networking issues are quite common in general).

You'd need to share more information for further help, e.g. the contents of the docker-compose.yml file(s) as well as e.g. the Dockerfile you use to create your images. I also wonder why you have created different images for the two brokers, typically you only need a single Kafka broker image, and then simply launch multiple containers (one per desired broker) off of that image.

Upvotes: 2

Related Questions