Reputation: 9540
I'm reading Kafka documentation about consumers and faced the following message consumption definition:
Our topic is divided into a set of totally ordered partitions, each of which is consumed by exactly one consumer within each subscribing consumer group at any given time. This means that the position of a consumer in each partition is just a single integer, the offset of the next message to consume.
I interpreted the wording as follows:
A consumer group reads data from a topic consisting of a number of partitions. Then each consumer from the group is assigned with some subset of partitions that do not overlap with other consumer's partitions from the group.
Consider the following case:
A consumer group GRP
consisting of 2 consumers C1
and C2
reads data from a topic TPC
consisting of 2 partitions P1
and P2
.
QUESTION: If at some point C1
reads from P1
and C2
reads from P2
can it be rebalanced so that C1
starts reading from P2
and C2
from P1
. If so under which condition may that happen?
It does not contradict to the quote above.
Upvotes: 2
Views: 136
Reputation: 18515
I see a few things to be discussed in your question and comment.
Your interpretation of the quoted paragraph is correct.
Question "If so under which condition may that happen?": Yes, this scenario can happen. A change in the assignment of a consumer to a TopicPartition is mainly triggered through a rebalancing. A consumer rebalance will be triggered in the following cases:
Consumer rebalances are initiated when
A Consumer leaves the Consumer group (either by failing to send a timely heartbeat or by explicitly requesting to leave)
A new Consumer joins the Consumer Group
A Consumer changes its Topic subscription
The Consumer Group notices a change to the Topic metadata for any subscribed Topic (e.g. an increase in the number of Partitions)
[Source: Training Material of Confluent Kafka Developer]
Keep in mind, that during a Rebalance all consumers are paused.
I see this scenario unrelated to a consumer rebalance, as your consumer C1 could just die after processing the data but before committing the back to Kafka. Now, if you restart the consumer C1 it will read the same messages again because it did not yet commit them.
This is called "at-least-once" delivery semantics and is different to "at-most-once" semantics when you have e.g. auto.commit enabled. I guess you are looking for the "holy grail" in distributed systems which is "exactly-once-semantics" :)
For this to achieve you need to consider the entire application from Kafka to the sink of your application. If the output of your application is not idempotent you are likely not able to achieve exactly-once semantics (EOS). But if your output sink e.g. is Kafka again you actually can achieve EOS.
Upvotes: 1