Anuja Barve
Anuja Barve

Reputation: 320

Kafka Connect and Streams

So i very recently started reading about Kafka and I am a little confused about the difference between Kafka Connect and Kafka Streams. As per the definition Kafka Streams can collect data from Kafka topic, process it and push the output to another Kafka topic. While Kafka Connect move large data sets into and out of Kafka.

My question is why do we need Kafka Connect can pretty much read the data, process it and push it to a topic? Why one extra component ? It will be great if someone can explain the difference Thanks in advance :)

Upvotes: 2

Views: 1898

Answers (3)

Namjith Aravind
Namjith Aravind

Reputation: 434

Kafka connect : Since Kafka acting as data hub (standard), kafka has to connect to the entire data sources in the world and import data . And these all have keeping the same behavior, So if we have a common framework and standard for this purpose. It will be very useful and clean. That's why Kafka connect is here. Its just bridge. No data transformation will happen here. Because its not for that purpose.

Kafka Streams: It is specially made for data transformation. So all the computation related libraries will be available here.

Upvotes: 3

OneCricketeer
OneCricketeer

Reputation: 191671

Kafka Connect shouldn't be used for extensive filtering and larger data transformations than select fields. There's a Kafka Summit talk on when not to use Simple Message Transforms (SMTs)

Kafka Streams can be embedded into any Java application to be used as a type of in memory KV store for applications to use. For example, one could write a web app and use a KTable as a database that's backed up by Kafka. Otherwise, it's just a more higher level library than the producer and consumer, but restricted to dealing only with a single Kafka cluster data. KSQL is an additional layer on top of this.

Kafka Connect on the other hand (while probably could be embedded; see Debezium embedded mode), is meant to be more "hands off" - if a connector exists, then all you need is config files, not writing any code yourself

Upvotes: 2

codejitsu
codejitsu

Reputation: 3182

Kafka Streams is a stream processing library for Apache Kafka. So, you can build streaming applications, read/write data from/to Kafka topics. It's a general purpose library.

On the flip side, Kafka Connect is a "data integration" framework. Usually you use Kafka Connect to import data from some data system like relational database into some Kafka topic. You can use the same framework for data export as well.

There are a lot of connectors for different data storage systems: HDFS, relational databases, ElasticSearch and more.

One of possible scenarios using both components (Kafka Connect, Kafka Streams) would be for example:

Continuously import data into Kafka topic from a relational database. Process that data using a Kafka Streams app which writes results into some output topic. Export data from that output topic into ElasticSearch using Kafka Connect.

[1] This blog post is a good overview of the both technologies playing together: https://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams/

Upvotes: 8

Related Questions