Reputation: 1217
I have a Spring Kafka Consumer application that lives in K8. Sometimes the application is recycled/restarted. When the consumer comes back on, I want it to consume all the messages that were produced while it was recycling. I experimented with auto.offset.rest=earliest and it works as expected, but I noticed that the default value for kafka is latest.
What are the risks imposed if I use earliest? In what scenarios I go with latest v.s earliest? I tried to find a post on here that explains it via a scenario but most of them were copy pasted from some documentation than real life example.
Upvotes: 2
Views: 14757
Reputation: 174544
That property only applies if the broker has no committed offset for the group/topic/partition.
i.e. the first time the app runs or if the offsets expire (with modern brokers, the default expiration is when the consumer has not run for 7 days - 7 days after the last consumer left the group). With older brokers, offsets were expired much earlier - even if the consumer was still running, but hasn't received anything. The current behavior started with 2.1, IIRC.
When there is already a committed offset, that is used for the start position, and this property is ignored.
For most use cases, earliest
is used but, as you say, latest
is the default, which means "new" consumers will start at the end and won't get any records already in the topic.
So the "risk" is, if you don't run your app for a week, you'll get any unexpired records again. You can increase offset.retention.minutes to avoid that.
Upvotes: 5