Niraj
Niraj

Reputation: 113

How efficient are Kafka EARLIEST and Kafka LATEST offset resets?

Problem

I am thinking about implementing binary search to find a starting offset for time-based event replaying. In order to do so I was thinking about using EARLIEST to find the beginning offset and LATEST to find the latest offset. After that I could implement binary search to find what offset I need to start replaying from.

Question

I was wondering how efficient seeking to EARLIEST and LATEST is and how it is implemented. Do they simply just use the znode timestamp in your topic's directory and find the file with the latest timestamp to look at? That would be my guess but I'm just shooting into the dark there.

Thank you in advance!

Upvotes: 0

Views: 3071

Answers (1)

vanekjar
vanekjar

Reputation: 2406

If you use Kafka version 0.10 problem has already been solved for you. Since Kafka 0.10 each message can contain timestamp that can be used for accurate searching. Kafka maintains an index based on timestamps which allows users to seek to offset based on time.

Kafka 0.10
You can seek to offset by given timestamp using this method KafkaConsumer#offsetsForTimes

Kafka 0.9 and earlier
There is no timestamp in messages. You can't seek accurately, but you can at least get an approximate offset before given timestamp. Then you need to use Kafka Simple API. I recommend to read more about this topic in this blog post about A Closer Look at Kafka OffsetRequest

Upvotes: 3

Related Questions