Reputation: 113
I am designing this below flow and want to know if am going in the right way. i want to skip any unwanted steps if added. I have Hadoop running on spark engine.
Upvotes: 0
Views: 205
Reputation: 191963
Use Debezium to pull from RDBMS. All writes therefore end up in Kafka, and you don't end up with "batches" at all. (Sqoop is a retired Apache project)
Use Apache Pinot or Druid to ingest Kafka directly. Then you don't need HDFS.
You can query Pinot / Druid using SQL. Or you can use Presto in place of Hive/SparkSQL, and you should be able to link SuperSet to Presto rather than an intermediate RDMBS.
Upvotes: 1