Reputation: 1
We are struggling to find some "best practice" regarding an usage of Kafka/Connect for CDC. What we are trying to achieve; Extract online redo logs from Oracle through Kafka Connect. We have ~700 different tables with between a couples of rows to ~40M rows for the biggest tables.
What we thought to use:
What's the better approach; 1 connector per table ? meaning we will end up with 700 connectors ? (+700 for the related DDL in "database.server.name" ?)
because if we keep only 1 connector for all tables, the issue is that it will not be parallelised.
I tried to add 3 kafka workers or 3 kafka connect but the issue is still the same, I have only 1 table being processed at the same time.
Any best practice or return of experience will be much appreciated.
Many thanks,
Upvotes: 0
Views: 1204
Reputation: 1
For starters you don't want one connector per table. Debezium is a log based replication so additional connectors will probably not be faster than a single connector with many tables and will likely put undue stress on your database. The bigger factor in performance is whether or not you will use the Log Miner interface or whether you can use a method that extracts fro the redo logs directly. Going through the Logminer SQL interface will have some performance overhead. I think there is now an OSS tool that can work with Debezium to hit the redo logs directly but traditionally doing so required licensing Golden Gate.
Upvotes: 0