Reputation: 2853
I have been implementing ETL
data pipeline using Apache Kafka. I have used Kafka Connect for Extraction and Load.
Connect will read the source data and write Kafka topic actual data available in the form of JSON.
On Transform phase I want to read the JSON data from a Kafka topic then need I need to convert into the SQL queries based on some custom business logic then need to write to output Kafka topic.
As of now, i have written a producer-consumer application which read from topic do the transformation and then writes to output topic.
Is it possible to achieve the same using Kafka stream API? If Yes Please provide Some samples.
Upvotes: 2
Views: 1584
Reputation: 32060
Check out Kafka Streams, or KSQL. KSQL runs on top of Kafka Streams, and gives you a very simple way to build the kind of aggregations that you're talking about.
Here's an example of doing aggregations of streams of data in KSQL
SELECT PAGE_ID,COUNT(*) FROM PAGE_CLICKS WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY PAGE_ID
See more at : https://www.confluent.io/blog/using-ksql-to-analyse-query-and-transform-data-in-kafka
You can take the output of KSQL which is actually just a Kafka topic, and stream that through Kafka Connect e.g. to Elasticsearch, Cassandra, and so on.
Upvotes: 3