Reputation: 115
In a Kafka deployment a custom topic partitioner logic is used to route all messages that belong to the same root entity (for example all message for particular user) to the same partition.
Can anyone recommend a strategy on how to deal with partitioning logic change in such live system?
One example that affects the partitioning is the obvious change of the partitioner implementation. The other example would be change of the number of partitions for a given topic.
In both cases, we would end up in a situation where some of the messages for user A, that entered the Kafka before the change, will be in partition 1, while after the change in partitioning logic or number of partitions messages for that same user A will go the partition 2.
This can lead to a problem where messages for user A are processed out of order. Consumer reading the messages from partition 2 could process messages before the consumer that reads the messages from partition 1.
Have anyone faced this issue in live system? How did you or would you solve this issue?
This seems like a very common scenario, but I was not able to find anything about it.
Thanks
Upvotes: 2
Views: 118
Reputation: 3433
The best way to change how records are partitioned is to use the default Apache Kafka® partitioner, and change the record keys. If all records from a user need to go to the same topic then make sure they all have the same key.
If you'd like to change the keys for a whole set you can use KSQL to re-key (republish to a new topic with new keys) the data using the PARTITION BY
function.
Upvotes: 0
Reputation: 5259
By partitioning logic, if you meant partitioning algorithm, I do not understand how that would just change like that. As for increasing partitions, it is in theory not possible to achieve increasing of partitions while guaranteeing the order of messages. -- there is a KIP for that, but its status is still "under discussion".
What I do usually when I increase partitions is to accept a small downtime.
The playbook is like this:
Stop the producer
Monitor the lag for the consumer group
Once lag is zero, shut down the consumers
Increase the number of partitions
Start the consumers
Start the producers
This way, you can be sure that there are no message losses and no out of order message consumption.
If you want to avoid a downtime, you may have to rely on an external system which can temporarily hold the data per partition in order and publish, but that solution depends on a few things
Upvotes: 1