Aritra Sur Roy
Aritra Sur Roy

Reputation: 322

Using ksqldb to join data from multiple types of source connectors

We are evaluating ksqldb as a ETL tool for our organization. Our entire app is hosted on Microsoft Azure and mostly PaaS offerings are preferable in our organization. However 1 use case is that we have multiple microservices with their own databases and we want to join the tables in the databases together to produce some data in a denormalized format for some other tasks. An example would be Users table containing user data whereas Orders table contains all the orders. Users maybe in SQL format in MySQL whereas Orders maybe in NoSQL format in MongoDB. Now we need to generate some report on by joining Orders and Users tables together based on user_id. This can be done in ksqldb by using some joins on streams/tables and adding source connectors to each of the databases. Then we can write a sink connector to a new MongoDB database that can have the joined Users_Orders info. So if new data is added and the connectors and joins are running our joined data in Users_Orders will also get updated.

With Azure Event Hub I read that using ksqldb in production will not be possible because of some licensing issues. So my questions are:

Before going into some other products like Azure HDInsights or Confluent Cloud is there any way of running ksqldb to achieve the same solution (perhaps like managing your own Kafka cluster)?

Upvotes: 0

Views: 359

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191874

You don't necessarily need ksql; you should be able to do something similar with SparkSQL, offered in Azure (Databricks) or Flink (HDInsights or AKS). You don't necessarily need Kafka / EventHub either since Spark/Flink can read, join, and write Mongo/JDBC data all on its own (with the appropriate plugins).

The main reason ksqlDB isn't a hosted service by Azure, is that it conflicts with Confluent Licensing, but that does not prevent you from running it yourself, as long as you also adhere to the licensing restrictions of not publicly offering the ksqlDB REST API as a publicly available / paid API. I've not personally tried, but ksqlDB should work against EventHubs on its own, I don't think you need to self manage Kafka as the documentation suggests.

Upvotes: 2

Related Questions