Reputation: 291
I have a strange issue that I cannot understand how I can resolve. I have a kafka streams app (2.1.0) that reads from a topic with around 40 partitions. The partitions are using a range partition policy so at the moment some of them can be completely empty.
My issue is that during the downtime of the app one of those empty partitions was activated and a number of events were written to it. When the app was restored though, it read all the events from other partitions but it ignored the events already stored to the previous empty partition (the app has OffsetResetPolicy LATEST for the specific topic). On top of that when newer messages arrived to the specific partition it did consume them and somehow bypassed the previous ones.
My assumption is that __consumer_offsets
does not have any entry for the specified partition when restoring but how can I avoid this situation without losing events. I mean the topic already exists
with the specified number of partitions.
Does this sound familiar to anybody ? Am I missing something, do I need to set some parameter to kafka because I cannot figure out why this is happening ?
Upvotes: 0
Views: 862
Reputation: 1418
This is expected behaviour.
Your empty partition does not have committed offset in __consumer_offsets
. If there are no committed offsets for a partition, the offset policy specified in auto.offset.rest
is used to decide at which offset to start consuming the events.
If auto.offset.reset
is set to LATEST
, your Streams app will only start consuming at the latest offset in the partition, i.e., after the events that were added during downtime and it will only consume events that were written to the partition after downtime.
If auto.offset.reset
is set to EARLIEST
, your Streams app will start from the earliest offset in the partition and read also the events written to the partition during downtime.
As @mazaneica mentioned in a comment to your question, auto.offset.reset
only affects partitions without a committed offset. So your non-empty partitions will be fine, i.e., the Streams app will consume events from where it stopped before the downtime.
Upvotes: 2