Gal Shaboodi
Gal Shaboodi

Reputation: 764

Kafka to Kafka mirroring with sampling

Any idea how to make kafka-to-kafka mirroring but with a sampling (for example only 10% of the messages)?

Upvotes: 1

Views: 1021

Answers (1)

Adam Kotwasinski
Adam Kotwasinski

Reputation: 4564

You could use MirrorMakerMessageHandler (which is configured by message.handler parameter):

https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/tools/MirrorMaker.scala#L430

The handler itself would need to make a decision whether to forward a message. A simple implementation would be just a counter of messages received, and forwarding if 0 == counter % 10.

However this handler is invoked for every message received, so it means that you'd be receiving all of messages & throwing away 90% of them.


The alternative is to modify main loop, where the mirror maker consumer receives the message, and forwards it to producers (that send the message to mirror cluster) is here

https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/tools/MirrorMaker.scala#L428

You would need to modify the consumer part to either-or:

  • forward only N-th (10th) message/offset
  • seek to only N-th message in log

I prefer the former idea, as in case of multiple MM instances in the same consumer group, you would still get reasonable behaviour. Second choice would demand more work from you to handle reassignments.

Also, telling which message is from 10% is non-trivial, I just assumed that it's every 10th message received.

Upvotes: 2

Related Questions