Reputation: 113
Problem
I am thinking about implementing binary search to find a starting offset for time-based event replaying. In order to do so I was thinking about using EARLIEST to find the beginning offset and LATEST to find the latest offset. After that I could implement binary search to find what offset I need to start replaying from.
Question
I was wondering how efficient seeking to EARLIEST and LATEST is and how it is implemented. Do they simply just use the znode timestamp in your topic's directory and find the file with the latest timestamp to look at? That would be my guess but I'm just shooting into the dark there.
Thank you in advance!
Upvotes: 0
Views: 3071
Reputation: 2406
If you use Kafka version 0.10 problem has already been solved for you. Since Kafka 0.10 each message can contain timestamp that can be used for accurate searching. Kafka maintains an index based on timestamps which allows users to seek to offset based on time.
Kafka 0.10
You can seek to offset by given timestamp using this method KafkaConsumer#offsetsForTimes
Kafka 0.9 and earlier
There is no timestamp in messages. You can't seek accurately, but you can at least get an approximate offset before given timestamp. Then you need to use Kafka Simple API. I recommend to read more about this topic in this blog post about A Closer Look at Kafka OffsetRequest
Upvotes: 3