Reputation: 764
Any idea how to make kafka-to-kafka mirroring but with a sampling (for example only 10% of the messages)?
Upvotes: 1
Views: 1021
Reputation: 4564
You could use MirrorMakerMessageHandler
(which is configured by message.handler
parameter):
https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/tools/MirrorMaker.scala#L430
The handler itself would need to make a decision whether to forward a message. A simple implementation would be just a counter of messages received, and forwarding if 0 == counter % 10
.
However this handler is invoked for every message received, so it means that you'd be receiving all of messages & throwing away 90% of them.
The alternative is to modify main loop, where the mirror maker consumer receives the message, and forwards it to producers (that send the message to mirror cluster) is here
https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/tools/MirrorMaker.scala#L428
You would need to modify the consumer part to either-or:
I prefer the former idea, as in case of multiple MM instances in the same consumer group, you would still get reasonable behaviour. Second choice would demand more work from you to handle reassignments.
Also, telling which message is from 10% is non-trivial, I just assumed that it's every 10th message received.
Upvotes: 2