Anshita Singh
Anshita Singh

Reputation: 1884

What consumer offset will be set if auto.offset.reset=earliest but topic has no messages

I have Kafka server version 2.4 and set log.retention.hours=168(so that messages in the topic will get deleted after 7 days) and auto.offset.reset=earliest(so that if the consumer doesn't get the last committed offset then it should be processed from the beginning). And since I am using Kafka 2.4 version so by default value offsets.retention.minutes=10080 (since I am not setting this property in my application).

My Topic data is : 1,2,3,4,5,6,7,8,9,10

current consumer offset before shutting down consumer: 10

End offset:10

last committed offset by consumer: 10

So let's say my consumer is not running for the past 7 days and I have started the consumer on the 8th day. So my last committed offset by the consumer will get expired(due to offsets.retention.minutes=10080 property) and topic messages also will get deleted(due to log.retention.hours=168 property).

So wanted to know what consumer offset will be set by auto.offset.reset=earliest property now?

Upvotes: 8

Views: 6103

Answers (2)

Michael Heil
Michael Heil

Reputation: 18525

Although no data is available in the Kafka topic, your brokers still know the "next" offset within that partition. In your case the first and last offset of this topic is 10 whereas it does not contain any data.

Therefore, your consumer which already has committed offset 10 will try to read 11 when started again, independent of the consumer configuration auto.offset.reset.

Your example will get even more interesting when your topic has had offsets, say, until 15 while the consumer was shut down after committing offset 10. Now, imagine all offsets were removed from the topic due to the retention policy. If you then start your consumer only then the consumer configuration auto.offset.reset comes into effect as stated in the documentation:

"What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted)"

As long as the Kafka topic is empty there is no offset "set" for the consumer. The consumer just tries to find the next available offset, either based on

  • the last committed offset or,
  • in case the last committed offset does not exist anymore, the configuration given through auto.offset.reset.

Just as an additional note: Even though the messages seem to get cleaned by the retention policy you may still see some data in the topic due to Data still remains in Kafka topic even after retention time/size

Upvotes: 2

Ryuzaki L
Ryuzaki L

Reputation: 40078

Once the consumer group gets deleted from log, auto.offset.reset will take the precedence and consumers will start consuming data from beginning.

My Topic data is : 1,2,3,4,5,6,7,8,9,10

If the topic has the above data, the consumer will start from beginning, and all 1 to 10 records will be consumed

My Topic data is : 11,12,13,14,15,16,17,18,19,20

In this case if old data is purged due to retention, the consumer will reset the offset to earliest (earliest offset available at that time) and start consuming from there, for example in this scenario it will consume all from 11 to 20 (since 1 to 10 are purged)

Upvotes: 0

Related Questions