Julian Gordon
Julian Gordon

Reputation: 11

Performance issues when replicating tables with Kafka Connect & Debezium to Kafka

I'm experiencing performance issues when replicating tables using Debezium and Kafka Connect.

The slow replication is only experienced during the initial snapshot of the database. One table that I tested with, contains 3.4m rows and the replication took 2 hours to complete.

At this stage, the entire database was locked and I was unable to commit data to other tables that were not being synced at the time.

My configuration (Debezium config deployed via curl request):

{
  "name": "debezium-connector",  
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",  
    "database.hostname": "redacted",  
    "database.port": "3306",
    "database.user": "redacted",
    "database.password": "redacted",
    "database.server.id": "54005",
    "database.server.name": "redacted", 
    "database.include.list": "redacted",
    "table.include.list": "redacted",
    "database.history.consumer.security.protocol":"SSL",
    "database.history.producer.security.protocol":"SSL",
    "database.history.kafka.bootstrap.servers": "redacted",  
    "database.history.kafka.topic": "schema-changes.debezium-test",
    "snapshot.mode": "when_needed",
    "max.queue.size": 81290,
    "max.batch.size": 20480
  }
}

Kafka Connect configuration that was changed:

CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_FLUSH_TIMEOUT_MS: 60000

Questions: 1 - How can I improve performance during the initial snapshot of the database? 2 - How can I replicate a limited number of tables from a database, without locking the entire database?

Upvotes: 0

Views: 3410

Answers (1)

Jiri Pechanec
Jiri Pechanec

Reputation: 1976

if you can make sure that database schema will not change during snapshot process then you can avoid locking the database via https://debezium.io/documentation/reference/1.3/connectors/mysql.html#mysql-property-snapshot-locking-mode

Also check https://debezium.io/documentation/reference/1.3/connectors/mysql.html#mysql-property-min-row-count-to-stream-results option, there might be also some performance change using it properly.

You can also max.batch.size together with max.queue.size even more than you have it right now.

Upvotes: 1

Related Questions