yogibear
yogibear

Reputation: 14907

How does Zookeeper/Kafka retain offset for a consumer?

Is the offset a property of the topic/partition, or is it a property of a consumer?

If it's a property of a consumer, does that mean multiple consumers reading from the same partition could have different offsets?

Also what happens to a consumer if it goes down, how does Kafka know it's dealing with the same consumer when it comes back online? presumably a new client ID is generated so it wouldn't have the same ID as previously.

Upvotes: 7

Views: 3949

Answers (2)

Jakub
Jakub

Reputation: 3991

In most cases it is a property of a Consumer Group. When writing the consumers, you normally specify the consumer group in the group.id parameter. This group ID is used to recover / store the latest offset from / in the special topic __consumer_offsets where it is stored directly in the Kafka cluster it self. The consumer group is used not only for the offset but also to ensure that each partition will be consumed only from a single client per consumer group.

However Kafka gives you a lot of flexibility - so if you need you can store the offset somewhere else and you can do it based on whatever criteria you want. But in most cases following the consumer group concept and storing the offset inside Kafka is the best thing you can do.

Upvotes: 9

Ryuzaki L
Ryuzaki L

Reputation: 40048

Kafka identifies consumer based on group.id which is a consumer property and each consumer should have this property

A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using subscribe(topic) or the Kafka-based offset management strategy

And coming to offset it is a consumer property and broker property, whenever consumer consumes messages from kafka topic it will submit offset (which means consumed this list of messages from 1 to 10) next time it will start consuming from 10, offset can be manually submitted or automatically submitted enable.auto.commit

If true the consumer's offset will be periodically committed in the background.

And each consumer group will have its offset, based on that kafka server identifies either new consumer or old consumer was restarted

Upvotes: 1

Related Questions