Kafka connect rebalance by topic

Question

We are using a Kafka stack to perform extraction, transformation, and loading (ETL) of a set of databases to another. We utilize Debezium for capturing data changes (CDC), Kafka Streams for data transformation, and Kafka Connect JDBC Sink for writing data to the target database. In our scenario, we have multiple source databases being monitored by Debezium.

Initially, we had only one topic with several partitions for each table in the JDBC Sink. This caused some issues since the data from each source database should be independently inserted into the destination database.

To address this problem, we decided to create a topic in the JDBC Sink for each source database and customize the JDBC Sink to use a regular expression (regex) for topic subscription. Additionally, we implemented logic to pause consumption from any topic that has inconsistent records. This approach of pausing consumption has been working well for us.

After deploying this new solution in production, we decided to test increasing the number of tasks in the JDBC Sink from one to ten. However, we encountered an issue: as the new model was designed to divide data into topics rather than partitions, only task 0 is subscribing to all topics, leaving the remaining tasks waiting.

We would like to know if there is a way to make Kafka Connect (Kafka Consumer) perform automatic rebalancing of topics. We didn't test subscribing by topic names instead of regular expressions. Could Kafka handle this approach better?

Kafka connect rebalance by topic

Answers (1)

Related Questions