Reputation: 7346
In our production environment, we often see that the partitions go under-replicated while consuming the messages from the topics. We are using Kafka 0.11. From the documentation what is understand is
Configuration parameter replica.lag.max.messages
was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
Configuration parameter replica.lag.time.max.ms
now refers not just to the time passed since last fetch request from the replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms
will be considered out of sync.
How do we fix this issue? What are the different reasons for replicas go out of sync? In our scenario, we have all the Kafka brokers in the single RACK of the blade servers and all are using the same network with 10GBPS Ethernet(Simplex). I do not see any reason for the replicas to go out of sync due to the network.
Upvotes: 11
Views: 52776
Reputation: 740
I faced the same issue on Kafka 2.0, On restart Kafka controller node everything caught-up on the replicas.
But still looking for the reasons why few partitions are under-replicated whereas the other partitions on the same nodes for the same topic works good, and this issue i see on a random partitions.
Upvotes: 1
Reputation: 1957
Do NOT run reassignment for all topics together, consider running it for small portions.
unclean.leader.election.enable
to true
for this topic.Preferred Replica Election
(in yahoo/kafka-manager or manually).Repeat for the rest of topics that have the same problem.
Also I tried this advice, it didn't help me: https://stackoverflow.com/a/51063607/1929406
Upvotes: 0
Reputation: 468
We faced the same issue:
Solution was:
No data lose.
The issue is due to a faulty state in ZK, there was an opened issue on ZK for this, don't remember the number.
Upvotes: 16