Apache Kafka Connect/Streams API for synchronising database tables

Question

I was reading through Kafka documentation on Connect API and trying to relate that to my problem domain. I have multiple databases which has common tables that I need to synchronise on any updates/inserts/deletes. An exmaple is this:

1) Someone updates table "order_history" in DB1 - I want the update to be communicated to DB2/DB3 etc.

2) Someone inserts a record into "purchase_order" - I want the insert to be communicated to DB2/DB3 etc. so that the same insert happens in those DB2/DB3 etc.

3) The tables will be in all the DBs - so no missing table issue there.

These are only to be done on a specific set of tables, not the entire database. What I understand from Connect API documentation is that I need to provide the following:

1) Source Connector imports data - from SQL/File system to Kafka topics

2) Sink Connector exports data - from kafka topics to SQL/File system/Hadoop FS

But then I am trying to understand how this is relevant to syncing multiple database tables on any inserts/updates/deletes - because Connect API still involves write/read on the topics - which might not necessarily what my use case is. I have also looked at Kafka Streams but it seems like an efficient tool when it comes to doing data aggregation and counter management, again not probably my use case.

Could anyone explain whether my assumption is correct, and I should still explore Streams/Connect API?

Regards,

Apache Kafka Connect/Streams API for synchronising database tables

Answers (1)

Related Questions