Reputation: 911
I'm not exactly looking for API to accomplish this rather internal implementation detail.
I know that recent versions of Kafka stores offsets for consumer-group in a special Kafka topic __consumer_offset.
My questions are :
What exactly is the data structure residing in this topic ?
When a conumer-group dies and comes up how does Kafka look-up for the offset in Topic-Partitions till which that consumer-group had consumed last time?
As far as my understanding is , Kafka topics are not suited for looking-up data : for examples : for queries like :
Select *offset* from __consumer_offset where consumer-group-name=*consumer-group* and topic=*topic-1*
Basically , I want to know the internal details of __consumer_offset or anything utilized for consumer offset management.
I read this wiki page https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management , but couldn't understand the in-memory data structure part.
Upvotes: 3
Views: 871
Reputation: 1293
Every consumer group is assigned a particular partition in the __consumer_offsets topic based on it's hash.
Then, offsets are simply written as messages to the __consumer_offsets topic.
To keep this topic from growing too large, periodically older offsets of a given consumer group are deleted.
For reads, the Kafka broker loads this data into memory as part of startup so that every request for offset doesn't cause disk I/O. Since only the latest offset is accessed frequently, in normal operation this doesn't amount to much data to be kept in memory.
Upvotes: 3