Gadam
Gadam

Reputation: 3024

Kafka default partitioner behavior when number of producers more than partitions

From the kafka faq page

In Kafka producer, a partition key can be specified to indicate the destination partition of the message. By default, a hashing-based partitioner is used to determine the partition id given the key

So all the messages with a particular key will always go to the same partition in a topic:

  1. How does the consumer know which partition the producer wrote to, so it can consume directly from that partition?
  2. If there are more producers than partitions, and multipe producers are writing to the same partition, how are the offsets ordered so that the consumers can consume messages from specific producers?

Upvotes: 0

Views: 2517

Answers (2)

Ran Lupovich
Ran Lupovich

Reputation: 1821

Kafka is distributed event streaming, one of its use cases is decoupling services from producers to consumers, the producer producing/one application messages to topics and consumers /another application reads from topics,

If you have more then one producer, the order that data would be in the kafka/topic/partition is not guaranteed between producers, it will be the order of the messages that are written to the topic, (even with one producer there might be issues in ordering , read about idempotent producer)

The offset is atomic action which will promise that no two messages will get same offset.

The offset is running number, it has a meaning only in the specific topic and specfic partition

If using the default partioner it means you are using murmur2 algorithm to decide to which partition to send the messages, while sending a record to kafka that contains a key , the partioner in the producer runs the hash function which returns a value, the value is the number of the partition that this key would be sent to, this is same murmur2 function, so for the same key, with different producer you'll keep getting same partition value

The consumer is assigned/subscribed to handle topic/partition, it does not know which key was sent to each partition, there is assignor function which decides in consumer group, which consumer would handle which partition

Upvotes: 0

OneCricketeer
OneCricketeer

Reputation: 191681

How does the consumer know which partition the producer wrote to

Doesn't need to, or at least shouldn't, as this would create a tight coupling between clients. All consumer instances should be responsible for handling all messages for the subscribed topic. While you can assign a Consumer to a list of TopicPartition instances, and you can call the methods of the DefaultPartitioner for a given key to find out what partition it would have gone to, I've personally not run across a need for that. Also, keep in mind, that Producers have full control over the partitioner.class setting, and do not need to inform Consumers about this setting.

If there are more producers than partitions, and multipe producers are writing to the same partition, how are the offsets ordered...

Number of producers or partitions doesn't matter. Batches are sequentially written to partitions. You can limit the number of batches sent at once per Producer client (and you only need one instance per application) with max.in.flight.requests, but for separate applications, you of course cannot control any ordering

so that the consumers can consume messages from specific producers?

Again, this should not be done.

Upvotes: 1

Related Questions