Reputation: 453
Have docker images of kafka brokers and zookeeper - call them z1, b1, b2
for now.
They are deployed on two physical servers s1
and s2
as so:
s1
contains z1
and b1
s2
contains b2
In their own docker-compose.yml
files, zookeeper has set ports as following:
- 2181:2181
- 2888:2888
- 3888:3888
and brokers as following:
- 9092:9092
Topic with --replication-factor 2
and --partitions 4
can be created.
No data are pushed to topic for whole time, but still following problem occurs.
If kafka-topics --describe --topic <name_of_topic> --zookeeper <zookeeperIP:port>
is run shortly after topic creation, all is insync
and looks good.
On second run (with short delay), b1
removes b2
partitions replicas from it's insync
, but b2
doesn't remove b1
partitions replicas from insync
.
In server.log from b1
, there are showing many of these exceptions:
WARN [ReplicaFetcherThread-0-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@42746de3 (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to ef447651b07a:9092 (id: 1 rack: null) failed
at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:83)
at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:93)
at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:248)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Swapping leadership works between brokers b1
and b2
, as they are shut down and started again, but then only the last one online is in full control of topic - is leader for all partitions and only one insync
, even if the other broker comes back online.
Tried cleaning all data, reseting both brokers and zookeeper, but problem persists.
Why partitions aren't properly replicated ?
Upvotes: 0
Views: 686
Reputation: 453
I figured it out. There was problem with network, as Michael G. Noll said.
Firstly, I don't map ports manually anymore and use host
network instead. It's easier to manage.
Secenodary, b1 and b2 had listeners set like so:
listeners=PLAINTEXT://:9092
They both had no ip specified, so 0.0.0.0
was used by default and there was collission, as they both listened there and pushed same connection information to zookeeper.
Final configuration:
b1
and b2
docker-compose.yml
use host
network:
network_mode: "host"
b1
server.properties` - listeners:
listeners=PLAINTEXT://<s1_IP>:9092
b2
server.properties` - listeners:
listeners=PLAINTEXT://<s2_IP>:9092
Everything works fine now, replication is working, even on broker restarts. Data can be produced and consumed correctly.
Upvotes: 0
Reputation: 15087
It looks like the brokers b1
and b2
can't talk to each other, which indicates a Docker-related networking issue (and such Docker networking issues are quite common in general).
You'd need to share more information for further help, e.g. the contents of the docker-compose.yml
file(s) as well as e.g. the Dockerfile
you use to create your images. I also wonder why you have created different images for the two brokers, typically you only need a single Kafka broker image, and then simply launch multiple containers (one per desired broker) off of that image.
Upvotes: 2