Reputation: 2312
I am trying to debug a issue for which I am trying to prove that each distinct key only goes to 1 partition if the cluster is not rebalancing.
So I was wondering for a given topic, is there a way to determine which partition a key is send to?
Upvotes: 8
Views: 10839
Reputation: 191983
As explained here or also in the source code
You need the byte[] keyBytes
assuming it isn't null, then using org.apache.kafka.common.utils.Utils
, you can run the following.
Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
For strings or JSON, it's UTF8 encoded, and the Utils class has helper functions to get that.
For Avro, such as Confluent serialized values, it's a bit more complicated (a magic byte, then a schema ID, then the data). See Wire format
In Kafka Streams API, You should have a ProcessorContext
available in your Processor#init
, which you can store a reference to and then access in your Processor#process
method, such as ctx.recordMetadata.get().partition()
(recordMetadata
returns an Optional
)
only goes to 1 partition
This isn't a guarantee. Hashes can collide.
It makes more sense to say that a given key isn't in more than one partition.
if the cluster is not rebalancing
Rebalancing will still preserve a partition value.
Upvotes: 14
Reputation: 1179
when you send message, Partitions are determined by the following classes
If you want change logics, implement org.apache.kafka.clients.producer.Partitioner interface and, set ProduceConfig's 'partitioner.class'
reference docuement : https://kafka.apache.org/documentation/#producerconfigs
Upvotes: 0