Reputation: 383
When I run this command, I get 2 topics. I know I created the test topic but I see an additional topic called "__consumer_offsets". From the name it implies that it is related to consumer offsets, but how is it being used?
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
__consumer_offsets
test
$ bin/kafka-topics.sh --describe --zookeeper localhost:2181
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 1 Leader: 0 Replicas: 0 Isr: 0
*
*
*
Topic: __consumer_offsets Partition: 48 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 49 Leader: 0 Replicas: 0 Isr: 0
This is happening in Kafka 1.1.0 and why there are 50 partitions. Also looking for a way to disable this because every time I try to run "describe" the topics, first it prints the 50 partitions of the __consumer_offsets and then prints my topics.
Upvotes: 4
Views: 8267
Reputation: 346
In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in an internal/system level topic i.e. __consumer_offsets.
Whenever you create a topic without specifying the number of partitions explicitly , Kafka ends up creating 50 partitions by default for that topic. Same implies to the topic __consumer_offsets.
Upvotes: 5
Reputation: 296
The topic __consumer_offsets
is used by consumers to store the offsets of message that's they read. It enable recovery when a consumer restart it will read the last position that it consume before the it went down et processing the next offset.
@cricket_007 was right, you can have duplicate by default in Kafka, it's the at least once semantics that's used.
Upvotes: 2
Reputation: 10213
consumers store the last consumed message offset id in kafka topic __consumer_offsets
based on the consumer group id.
This enables different consumers(obviously with different consumer id) to process the next message after the last consumed message and avoid duplicate message processing.
Upvotes: 2