Reputation: 11
I have kafka cluster with certain topic that had too few partitions, so a large backlog of messages was collected. After i added additional partitions, only the newly messages balanced between all the new partitions.
What is the preferred way to balance the "old" backlog of messages inside the original partitions across all the new partitions?
I thought of reading and writing again all the messages backlog to this topic and update the offsets accordingly, but it will make duplication of messages if a new consumer group will start consuming from the beginning of this topic.
Upvotes: 1
Views: 1795
Reputation: 191681
You can reassign partitions to new brokers, but moving existing segments of partitions to others won't work.
You would need to consume all the data, push it onto a new topic with more partitions in order to spread it back out, and if you really care about that data not having duplicate reads by consumers, you would need to track which data has been consumed, ideally by some UUID generated at the producer side rather than just an offset or timestamp. Or you could coordinate stopping the producers, have your consumers read the remainder of the messages, then migrate the producers and consumers to a brand new topic with more partitions.
Upvotes: 1