Reputation: 1647
I have below scenario, I have multiple big files (~200M records in each) and I want to send that file through kafka. For better performance I wanted to use Kafka partitioning to send the data. Now my data requirement is for a particular key all the messages should go to a specific partition. Currently for POC I was using 10 kafka partitions and using a numerical ID field to partition the data. My logic simply checks for the last digit of the IT and send the record to respective kafka partition. EX: ID - ***7
will always go the partition 7. Now this logic cannot be used to generalize my code as the key can be non numeric and number of partitions can be increased/decreased based on requirement.
I want to know is there a hashing algorithm that can generate values in specific range(like if I have to have 10 partition then it should create all the hash values ending in 0-9
) based on given range?
Upvotes: 2
Views: 2350
Reputation: 4314
Yes, you can simply use the hashcode of the key modulo the number of partitions. But that's what the default partitioner is using anyway, so might as well just use that.
Upvotes: 7