Reputation: 1403
I'm looking at logs for a problem we recently had with Kafka where we ended up with a full offset rewind. It looks from the logs like we had two of 3 replicas drop out at the same moment, or at least that is what one of the nodes logs is saying. Around that time, I see following log message repeated many times with different partition names:
ERROR [Controller id=0 epoch=71] Controller 0 epoch 71 failed to change state for partition PARTITION.NAME from OnlinePartition to
OnlinePartition (state.change.logger) kafka.common.StateChangeFailedException: Failed to elect leader for partition PARTITION.NAME under strategy PreferredReplicaPartitionLeaderElectionStrategy
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
at ...
What's odd about this is the part that says
from OnlinePartition to OnlinePartition
When I search the Google for this, I come up with nothing really helpful. The other thing is that all the stuff that does come up seems pretty old and refers to pre-1.0 versions of Kafka. We are supposedly running 1.1.0.
Any ideas as to what why there would be an attempt to change partition to the state it appears to already be in? I guess I can see how that could be considered a failure in that nothing would be changed but it seems nonsensical in general.
Upvotes: 5
Views: 12908
Reputation: 9425
According to Kafka Controller Internals page this is a valid state transition when a new partition leader should be selected:
Valid state transitions are:
. . .
OnlinePartition, OfflinePartition -> OnlinePartition
select new leader and isr for this partition and a set of replicas to receive the LeaderAndIsr request, and write leader and isr to ZK
...
c. PreferredReplicaPartitionLeaderSelector: new leader = first assigned replica (if in isr); new isr = current isr; receiving replicas = assigned replicas
...
send LeaderAndIsr request to every receiving replica and UpdateMetadata request to every live broker
EDIT
regarding the reset offsets, could you check if KAFKA-6189 is applicable in your case. If not, please share the configuration details of your cluster, topic and consumer group.
Upvotes: 2