Reputation: 322
We are evaluating ksqldb as a ETL tool for our organization. Our entire app is hosted on Microsoft Azure and mostly PaaS offerings are preferable in our organization. However 1 use case is that we have multiple microservices with their own databases and we want to join the tables in the databases together to produce some data in a denormalized format for some other tasks. An example would be Users
table containing user data whereas Orders
table contains all the orders. Users
maybe in SQL format in MySQL whereas Orders
maybe in NoSQL format in MongoDB. Now we need to generate some report on by joining Orders
and Users
tables together based on user_id
. This can be done in ksqldb by using some joins on streams/tables and adding source connectors to each of the databases. Then we can write a sink connector to a new MongoDB database that can have the joined Users_Orders
info. So if new data is added and the connectors and joins are running our joined data in Users_Orders
will also get updated.
With Azure Event Hub I read that using ksqldb in production will not be possible because of some licensing issues. So my questions are:
Before going into some other products like Azure HDInsights or Confluent Cloud is there any way of running ksqldb to achieve the same solution (perhaps like managing your own Kafka cluster)?
Upvotes: 0
Views: 359
Reputation: 191874
You don't necessarily need ksql; you should be able to do something similar with SparkSQL, offered in Azure (Databricks) or Flink (HDInsights or AKS). You don't necessarily need Kafka / EventHub either since Spark/Flink can read, join, and write Mongo/JDBC data all on its own (with the appropriate plugins).
The main reason ksqlDB isn't a hosted service by Azure, is that it conflicts with Confluent Licensing, but that does not prevent you from running it yourself, as long as you also adhere to the licensing restrictions of not publicly offering the ksqlDB REST API as a publicly available / paid API. I've not personally tried, but ksqlDB should work against EventHubs on its own, I don't think you need to self manage Kafka as the documentation suggests.
Upvotes: 2