user8215502
user8215502

Reputation: 201

How kafka partitions behave

Can you explain how kafka partitions works for this scenario

If i produce 9 (1-9) messages round robin with 1 topic & 3 partitions.

Does it means that:

Partition 1 contains: [1,4,7]

Partition 2 contains: [2,5,8]

Partition 3 contains: [3,6,9]

?

Also how many consumers can get all the data 3? why?

Can you explain?

I guess also that consumer group can solve it but not sure why

Upvotes: 1

Views: 221

Answers (3)

Hans Jespersen
Hans Jespersen

Reputation: 8335

The distribution of messages in the partitions is correct if and only if you publish messages without keys. In Kafka it is common to publish messages as (Key, Value) pairs and if you produce messages this way then the default partitioner will ensure that all messages of the same key will get put in the same partition. It does this by using a hashing function on each of the keys that maps to one of the available partitions. In the extreme case where all your messages have the same key, then they would all go to the same partition. If your messages all had either a string key "foo" or a key called "bar" then all the messages with key "foo" may go to partition 3 and all the messages with key "bar" may go to partition 1.

In terms of your question about consumers, you can have an unlimited number of consumers. If each consumer has a unique group.id then they are considered independent and they will each get their own full set of the messages from all partitions.

However if you have consumers that share the same group.id then they are said to be in a consumer group and each will get an exclusive and roughly equal subset of the partitions. If you had 3 consumers in the same group they would get 1 partition each. If you added any more than 3 consumers in the same group then the first 3 will get 1 partition each and all the others will be standby consumers than only become active if one of the 3 active consumers leaves the group.

Upvotes: 1

ImbaBalboa
ImbaBalboa

Reputation: 867

The distribution of the messages through the partitions is correct in the idea. The partitions are the paralelism unit of Kafka.

You can have 3 consumers which will each handle one partition, but you can also have only 1 consumer which will get the data from the 3 partitions. It depends on the throughput you can have/want for each consumer.

Concerning the consumer groups :

  • If all your consumers have the same consumer group, the messages will be load balanced over the consumers
  • If your consumers have different consumer groups, then each messages will be broadcast to all consumer processes

FYI : the messages order is only kept within a partition, that is why messages coming from different partitions could be unordered.

Upvotes: 0

Michal Borowiecki
Michal Borowiecki

Reputation: 4314

Can you explain how kafka partitions works for this scenario

Your understanding is correct.

Also how many consumers can get all the data 3? why?

Depends on how many consumers you have in your consumer group.

If you only have 1 consumer in a group, it will get all the messages from all partitions.

If you have 2 consumers in a group, each will claim a subset of the partitions, e.g. 1st consumer will get all messages from partitions 1 and 2 and the 2nd consumer will get messages from partition 3.

If you have 3 consumers in a group, each will get one partition assigned.

If you have more than 3 consumers in a group, 3 consumers will get one partition each and the remaining consumers will not get any messages, just act as redundancy in case of failover.

Upvotes: 1

Related Questions