Reputation: 12025
As you can see on the pic below, there are two consumers, both reads the same partition.
Why should consumer 2 read all messages that reads a consumer 1? How it can be useful in practice?
How to know size of partition and position start/end read?
Does Kafka remember offset partition for each consumers? Is it like fanout in RabbitMQ?
Upvotes: 0
Views: 255
Reputation: 4662
In Kafka, each topic is divided into partitions. A consumer group consists of a group of consumers with the same group Id. Kafka will assign a subset of the total partitions to each consumer in the group. Say for example, your topic has 4 partitions and you CG has two consumers. Each consumer in this case would be assigned two partitions. Say for example, consumer 1 gets partition 1 and 3, and consumer 2 gets partition 2 and 4. Unless there is a rebalance, the consumers will read only their assigned partitions and won't touch any other subset. Kafka will maintain the last committed offset for each partition so that in case a rebalance occurs, the new consumers know where to start from. This metadata is maintain per consumer group.
When you add a new CG, all the consumers will start from offset 0, regardless of what the other consumers have consumed. This is a very useful. I'll give an example for my work:
We consume from a topic, and many a times, some events will fail the processing. We did not have dead-letter queue to push these failed events till now, so to replay those events, what we used to do is find out the keys for the failed events, make changes to our processor to process only these events, and deploy it. In addition to this, we would also change the consumer group so that the newly deployed service starts consuming from offset 0 for every partition. This way, we would replay all the events but process only the affected ones.
Upvotes: 0
Reputation: 3522
For example, Consumer A ( or Consumer group 1) is consuming data for monitoring and alerting. Whereas Consumer B (or Consumer group 2) is consuming the same data for Hadoop or Amazon S3.
By using Consumer group, you can ingest data efficiently. When one consumer goes down, other consumer can take his place. You can easily add more consumers and remove consumers for performance.
There is a setting for your partition size.
Lastly, for offsets, refer to https://stackoverflow.com/a/57003889/10504469
Upvotes: 0
Reputation: 1418
In your example, consumer1 and consumer2 are in different consumer group, which might not be the correct way to use Kafka according to you application needs.
Consumer groups are defined per application. I mean, one entire service should share the same consumer group ID, and that way, the more consumer you'll pop in this consumer group, the more you'll be able to scale out ( according you have set a coherent number of partitions for the corresponding topic in Kafka).
So in your example, it's completely normal that consumer 2 read all messages read by consumer1, because they don't share the same group ID, thus, it's like they are not from the same application ( one might wants to consume messages for accouting for example , and the other consumer, for monitoring purpose).
If they were in the same group id, they would share the partitions amongst them, and they would not read the same messages.
Kafka is not a queue, it's a log, poll based architecture. Thus this understandable behavior.
For your other questions regarding offsets, I invite you to look on Google, you have plenty of article dealing with this.
This one is a good start: https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html
Yannick
Upvotes: 1