maverick
maverick

Reputation: 2345

Message pre-processing (topic - topic) - Kafka Connect API vs. Streams vs Kafka Consumer?

We're in need to do some pre-processing on every message (decrypt/re-encrypt with different keys) from one topic into another one.

I've been looking into using Kafka Connect since it provides a lot of good things out of the box (config management, offset storage, error handling, etc.).

But it also feels that I'd end implementing SourceConnector and SinkConnector to just move data between two topics and neither of those interfaces are meant to do Topic A -> (Connector) -> Topic B. Is this the right approach? Should I just use SinkConnector alone and have my SourceTask.put() do all the logic to write to Kafka?

Other options are KafkaConsumer/Producer and or Streams but these will then need its own instance(s) to run the logic, not offset retry error handling.

Upvotes: 0

Views: 820

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191671

provides a lot of good things out of the box (config management, offset storage, error handling, etc.)

Config management shouldn't be harder than re-deploying an application, but that depends on any version control or CI/CD pipelines you may or may not have.

Kafka Producer/Consumer and Streams offer offset management, you just have to configure it to do anything but the defaults.

Error handling is fairly well-documented, don't just fire-and-forget if you care about detecting errors. Connect itself will stop consuming and producing under critical error conditions, not retry or skip messages.

Neither of those interfaces are meant to do Topic A -> (Connector) -> Topic B"

Have you seen Confluent Replicator (licensed product)? That is Kafka Connect between two topics.

Else, have you seen MirrorMaker? That is a producer-consumer pair commonly for replicating data between individual clusters, but can be used with the same source and destination settings. You just need to ensure you are not creating a feedback loop. You would need to apply "custom logic" on that (and change the topic name), with what is called a Handler class that is placed on your Kafka classpath

bin/kafka-mirror-maker.sh

...

--message.handler <String: A custom      Message handler which will process
  message handler of type                  every record in-between consumer and
  MirrorMakerMessageHandler>               producer.
--message.handler.args <String:          Arguments used by custom message
  Arguments passed to message handler      handler for mirror maker.
  constructor.>

Confluent MirrorMaker documentation
Kafka MirrorMaker documentation


There is nothing stopping you from implementing the Connect API, and it may be easier to manage than a Kafka Streams application without an external cluster manager. Plus, since Connect is a Java library, you could in theory use Streams library internally within that.

Upvotes: 2

Related Questions