How the Kafka Topic Offsets works

Question

I have a question about how the topic offsets works in Kafka, are they stored B-Tree like structure in Kafka?

The specific reason I ask for it, lets say I have a Topic with 10 millions records in Topic, that will mean 10 millions offset if no compaction occurred or it is turned off, now if I use consumer.seek(5000000), it will work like LinkList by that I mean, it will go to 0 offset and will try to hop from there to 5000000th offset or it does have index like structure will tell exactly where is the 5000000th record in the log?

Thx for answers?

Mickael Maison · Accepted Answer

Kafka records are stored sequentially in the logs. The exact format is well described in the documentation.

Kafka usually expects read to be sequential, as Consumers fetch records in order. However when a random access is required (via seek or to restart from a specific position), Kafka uses index files to quickly find a record based on its offset.

A Kafka log is made of several segments. Each segments has an index and a timeindex file associated which map offsets and timestamp to file position. The frequency at which entries are added to the indexes can be configured using index.interval.bytes. Using these files Kafka is able to immediately seek to the nearby position and avoid re-reading all messages.

You may have noticed after an unclean shutdown that Kafka is rebuilding indexes for a few minutes. It's these indexes used to file position lookups that are being rebuilt.

How the Kafka Topic Offsets works

Answers (1)

Related Questions