Build a solution for Kafka+Spark for RDBMS data

Question

My current project is in MainFrames with DB2 as its database. We have 70 databases with nearly 60 tables in each of them. Our architect proposed a plan of using Kafka with Spark streaming for processing data. How good is Kafka in reading the RDBMS tables for data ? Do we directly read the data from the tables using Kafka or is there any other way to get the data from RDBMS into Kafka ? If there is any better solution, your suggestions can help a lot.

gorros · Accepted Answer

Do not directly read from database, it will create additional load. I would suggest two approaches.

Send new data both to databases and to Kafka, or send it to Kafka and then consume for processing.
Read data from database write ahead log (I know it is possible for MySQL with Maxwell but I am not sure for DB2) and send it to Kafka for further processing.

You can use Spark Streaming or Kafka Streams depending on your needs.

Build a solution for Kafka+Spark for RDBMS data

Answers (1)

Related Questions