Black Glix
Black Glix

Reputation: 709

Does it make sense to use kafka-connect to transform kafka messages?

We have confluents platform in our infrastructure. At core, we are using kafka broker to distribute events. Dozens of devices produce events to kafka topics (there is a kafka topic for each type of event), where events are serialized in google's protobuf. We have confluent's schema registry to keep track of the protobuf schemas.

What we need is, for several events, we need to apply some transformation and then publish the transformation output to some other kafka topic. Of course Kafka Streams is one way to accomplish that, like in this example. However, we don't want to have a java application for each transformation (which increase the complexity of the project and development/deployment effort), and it doesn't feels right to put all streams in one application (modifying one will require to stop all streams ans start again).

At this point, we thought that maybe Confluent's Kafka Connect might be better approach. We can have several workers, and we can deploy them into one kafka connect instance/or cluster. The question is;

Does it make sense to use kafka connect to get message from one kafka topic and send it to another kafka topic? Be cause all the use cases and examples aims to get data from outside (database, file etc.) to kafka, and from kafka to outside.

Upvotes: 1

Views: 990

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191681

To clarify, Kafka Connect is not "Confluent's", it's part of Apache Kafka.

While you could use MirrorMaker2/Confluent Replicator with transforms, it honestly wouldn't be much different than extracting the transformation logic into a shared library, then bundling a deployable Kafka Streams application that accepts configuration parameters for input and output topics with the transformation in-between.

You make a good point about single-point of administration, but that's also a single point of failure... If you use Connect, changing your transform plugin will also require you to stop and restart the Connect server, if all topics are part of the same connector, then any task failure would stop some percentage of the topic transformations

Kafka Streams (or KSQL) is preferred for inter-cluster translations, anyway

You could also look at solutions like Apache Nifi for more complex event management and routing

Upvotes: 1

Related Questions