Reputation: 846
I reconfigured my kafka cluster, changing:
So after restarting all nodes, the cluster seemed ok but then I noticed all the topics are failing to come online. In the logs there are messages like this for each topic:
state-change.log: [2018-02-01 12:41:42,176] ERROR Controller 826437096 epoch 19 initiated state change for partition [filedrop,0] from OfflinePartition to OnlinePartition failed (state.change.logger)
So none of the topics are usable; Listing topics with kafkacat -L -b shows leaders not available.
Metadata for all topics (from broker -1: lol-045:9092/bootstrap):
7 brokers:
broker 826437096 at lol-044:9092
broker 746155422 at lol-047:9092
broker 651737161 at lol-046:9092
broker 728512596 at lol-048:9092
broker 213763378 at lol-045:9092
broker 622553932 at lol-049:9092
broker 746727274 at lol-050:9092
14 topics:
topic "lol.stripped" with 3 partitions:
partition 2, leader -1, replicas:, isrs:, Broker: Leader not available
partition 1, leader -1, replicas:, isrs:, Broker: Leader notavailable
partition 0, leader -1, replicas:, isrs:, Broker: Leader not available
However, newly created topics are correctly replicated and healthy
topic "lol-kafka-health" with 3 partitions:
partition 2, leader 622553932, replicas: 622553932,213763378,651737161, isrs: 622553932,213763378,651737161
partition 1, leader 213763378, replicas: 622553932,213763378,826437096, isrs: 213763378,826437096,622553932
partition 0, leader 826437096, replicas: 213763378,746727274,826437096, isrs: 826437096,746727274,213763378
So I think some kind of metadata corruption happened during the reconfigure
My question is:
Given that:
In addition, are there some procedures I can use to investigate how recoverable these topics are?
Many thanks in advance!
Upvotes: 1
Views: 4375
Reputation: 846
The procedure described here allowed me to re-assign leaderless artitions to leaders with new broker Ids:
Upvotes: 0