tomer.z
tomer.z

Reputation: 1063

Kafka partition index file

So I was reading this: https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026

About the kafka storage internals and came up with 2 questions:

  1. The offset in the index file seems to increase monotonically, so why do we need to save it as well? why not use the index of the line in the file as the offset (0 based) instead and reduce the file size by half?

  2. If I understood correctly, the position that is saved in the log file is the position of that message inside the partition (basically, its index). The position saved in the index file is that same position right? for fast access, for example if I want message 6 of the partition, I look for 6 using binary search in the index file and using the offset in that same entry, I go to that line in the log file? (for example if position 6 has index 7 then I go to line 7 in the log file)

Upvotes: 1

Views: 1507

Answers (1)

Lalit
Lalit

Reputation: 2014

This is an interesting one!

My understanding of internals of Kafka is admittedly limited but I'll try and take a crack at it anyway.

For the first question:

I looked at the source code of OffsetIndex.scala - it seems that the offset part in the index file is being computed in the method relativeOffset() every time a new entry is made. To add to this, the description in the source code says

An index that maps offsets to physical file locations for a particular log segment. This index may be sparse: that is it may not hold an entry for all messages in the log.

So, as per the reference article you've shared - it is probably because of this sparse nature of this index, that

the offset look up uses binary search to find the nearest offset less than or equal to the target offset

From the explanation it appears that the offset is just getting incremented - it may not necessarily be the case. For example, I created a topic and looked at the contents of the log and the index.

****Contents of 000---180.index file were**** (Observe the offsets here - NOT increasing sequentially):

offset: 217 position: 4107 offset: 254 position: 8214 offset: 291 position: 12321 offset: 328 position: 16428 offset: 365 position: 20535 offset: 402 position: 24642 offset: 439 position: 28749

****Contents of 000---180.log file were**** (Observe the offsets here - increasing sequentially):

[Just to keep it easy on the eye, I've used 3 dots (...) to represent the rows in between those offsets available in the index]

offset: 217 position: 4107 CreateTime: 1537550091903 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 218 position: 4218 CreateTime: 1537550092908 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 219 position: 4329 CreateTime: 1537550093910 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] ... offset: 253 position: 8103 CreateTime: 1537550127960 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 254 position: 8214 CreateTime: 1537550128961 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 255 position: 8325 CreateTime: 1537550129962 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] ... offset: 289 position: 12099 CreateTime: 1537550164007 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 290 position: 12210 CreateTime: 1537550165008 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 291 position: 12321 CreateTime: 1537550166009 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 292 position: 12432 CreateTime: 1537550436878 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] ... offset: 327 position: 16317 CreateTime: 1537550471917 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 328 position: 16428 CreateTime: 1537550472919 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: [] offset: 329 position: 16539 CreateTime: 1537550473920 isvalid: true keysize: 0 valuesize: 43 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 isTransactional: false headerKeys: []

For the second question:

I think the above example should clarify this. Yes, the position in the index reflects the position in the segment log file and in the partition. In case of fetch requests, once the nearest offset is found in the binary search - the control goes to that offset in the segment log.

I hope this helps!

Upvotes: 2

Related Questions