bhalochele
bhalochele

Reputation: 227

Kafka connect or Kafka Client

I need to fetch messages from Kafka topics and notify other systems via HTTP based APIs. That is, get message from topic, map to the 3rd party APIs and invoke them. I intend to write a Kafka Sink Connector for this.

For this use case, is Kafka Connect the right choice or I should go with Kafka Client.

Upvotes: 8

Views: 5691

Answers (4)

fgul
fgul

Reputation: 6501

In the book that is called Kafka In Actionis explained like following:

The purpose of Kafka Connect is to help move data in or out of Kafka without having to deal with writing our own producers and clients. Connect is a framework that is already part of Kafka that really can make it simple to use pieces that have been already been built to start your streaming journey.

As for your problem, Firstly, one of the simpliest questions that one should ask is if you can modify the application code of the systems from which you need data interaction.

Secondly, If you would write custom connector which have the in-depth knowledge the ability and this connector will be used by others, it worth it. Because it may help others that may not be the experts in those systems. Otherwise, this kafka connector is used only by yourself, I think you should write Kafka connector. So you can get more flexibility and can write more easily implementing.

Upvotes: 0

Touraj Ebrahimi
Touraj Ebrahimi

Reputation: 576

you should use kafka connect sink when you are using kafka connect source for producing messages to a specific topic.

for e.g. when you are using file-source then you should use file-sink to consume what source have been produced. or when you are using jdbc-source you should use jdbc-sink to consume what you have produced.

because the schema of the producer and sink consumer should be compatible then you should use compatible source and sink in both sides.

if in some cases the schemas are not compatible you can use SMT (Simple message transform) capability that is added since version 10.2 of kafka onward and you will be able to write message transformers to transfer message between incompatible producers and consumers.

Note: if you want to transfer messages faster I suggest that you use avro and schema registry to transfer message more efficiently.

If you can code with java you can use java kafka stream, Spring-Kafka project or stream processing to achieve what you desire.

Upvotes: 0

vaquar khan
vaquar khan

Reputation: 11449

Kafka clients when you have full control on your code and you are expert developer, you want to connect an application to Kafka and can modify the code of the application.

push data into Kafka

pull data from Kafka.

https://cwiki.apache.org/confluence/display/KAFKA/Clients


Kafka Connect when you don’t have control on third party code new in Kafka and to you have to connect Kafka to datastores that you can’t modify code.

Kafka Connect’s scope is narrow: it focuses only on copying streaming data to and from Kafka and does not handle other tasks.

http://docs.confluent.io/2.0.0/connect/


I am adding few lines form other blogs to explain differences

Companies that want to adopt Kafka write a bunch of code to publish their data streams. What we’ve learned from experience is that doing this correctly is more involved than it seems. In particular, there are a set of problems that every connector has to solve:

• Schema management: The ability of the data pipeline to carry schema information where it is available. In the absence of this capability, you end up having to recreate it downstream. Furthermore, if there are multiple consumers for the same data, then each consumer has to recreate it. We will cover the various nuances of schema management for data pipelines in a future blog post.

• Fault tolerance: Run several instances of a process and be resilient to failures

• Parallelism: Horizontally scale to handle large scale datasets

• Latency: Ingest, transport and process data in real-time, thereby moving away from once-a-day data dumps.

• Delivery semantics: Provide strong guarantees when machines fail or processes crash

• Operations and monitoring: Monitor the health and progress of every data integration process in a consistent manner

These are really hard problems in their own right, it just isn’t feasible to solve them separately in each connector. Instead you want a single infrastructure platform connectors can build on that solves these problems in a consistent way.

Until recently, adopting Kafka for data integration required significant developer expertise; developing a Kafka connector required building on the client APIs.

https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/

Upvotes: 3

Ewen Cheslack-Postava
Ewen Cheslack-Postava

Reputation: 1431

Kafka Connect will work well for this purpose, but this would also be a pretty straightforward consumer application as well because consumers also have the benefits of fault tolerance/scalability and in this case you're probably just doing simple message-at-a-time processing within each consumer instance. You can also easily use enable.auto.commit for this application, so you will not encounter the tricky parts of using the consumer directly. The main thing using Kafka Connect would give you compared to using the consumer in this case would be that the connector could be made generic for different input formats, but that may not be important to you for a custom connector.

Upvotes: 1

Related Questions