Two different types of partitions in kafka producer

In Kafka producer, I am sending two different sets of data. I have two partitions for the topic. The first one is with a key and the second one is without a key. As far as I know the key is used to make partitions for the data. If the key is absent, null will be sent and the partition will be happening by round-robin scheduling.

But the question is if I am sending the data with and without key alternatively for some particular period of time, what will happen?

Will round robin scheduling happen for the partitions excluding the partition made by using key or will it happen for the all the two partitions?

Upvotes: 1

Views: 1026

Answers (3)

Pradeep Singh
Pradeep Singh

Reputation: 1144

I want to correct you. You said that the key is used to make partitions for the data. The key with a message is basically sent to get the message ordering for a specific field.

  • If key=null, data is sent round-robin (to a different partition and to a different broker in a distributed env. and of course to the same topic.).
  • If a key is sent, then all messages for that key will always go to the same partition.

Explain and example

  • key can be any string or integer, etc.. take an example of an integer employee_id as key.
  • So emplyee_id 123 will always go to partition 0, employee_id 345 will always go to partition 1. This is decided by the key hashing algorithm which depends on the number of partitions.
  • if you don't send any key then the message can go to any partition using a round-robin technique.

Upvotes: 1

Nitin
Nitin

Reputation: 3832

Kafka select partition as per defined below rules

  1. If used Custom Partitioner then partitioner will get selected based on Custom Partitioner logic.
  2. If no Custom Partitioner then Kafka uses DefaultPartitioner

a. if the key is null then partition selected on round-robin.

b. If the key is non-null keys then It uses Murmur2 hash with modulo to identify partitions for the topic.

So message with key (null or not null) would get published on both partitions using Default Partitioner with no Custom Partitioner defined.

To achieve a message publish in a specific partition you can use the below method.

  1. Pass partition explicitly while publishing a message

    /** * Creates a record to be sent to a specified topic and partition */ public ProducerRecord(String topic, Integer partition, K key, V value) { this(topic, partition, null, key, value, null); }

  2. You can create Custom Partitioner and implement logic to select the partition

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/Partitioner.html

Upvotes: 3

Fatema Khuzaima Sagar
Fatema Khuzaima Sagar

Reputation: 395

Kafka has a very organized scenario when it comes to sending and storing the records in the partitions. As you have mentioned, the Key is used for the purpose that the same key records go to the same partition. This helps in maintaining the chronology of those messages on that topic.

In your case, the two partitions will store the data as:

  1. Partition 1: Store the data which contains a particular key with it. The records with this key will always go to this Partition. This is the concept of Custom Partitioning. Apart from this, the key with null values will also go to this partition as it follows the Round Robin Fashion to store the records
  2. Partition 2: This partition will contain records which are entered without any key. i.e the key is null.

Upvotes: -1

Related Questions