Debezium is very slow, how to improve?

First time user of Debezium, I get only around 1000 messages per MINUTE in debezium ( Which is very slow compared to online benchmark ). No throttling on Kafka connect/ MySQL/ Kafka Broker, not sure what I am doing here. I will post the config here for reference.

Config of Kafka-Connect Worker:

-e CONNECT_GROUP_ID="quickstart" \
-e CONNECT_CONFIG_STORAGE_TOPIC="quickstart-config" \
-e CONNECT_OFFSET_STORAGE_TOPIC="quickstart-offsets" \
-e CONNECT_STATUS_STORAGE_TOPIC="quickstart-status" \
-e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="localhost" \
-e CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=1\
-e CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=1\
-e CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=1\

I use all default config for Kafka Debezium MySQL Connector

Upvotes: 2

Answers (2)

Thomas Decaux

Reputation: 22681

I assume this is the "first time" you connect debezium to mysql, so you could tune snapshot.max.threads setting.

Specifies the number of threads that the connector uses when performing an initial snapshot. To enable parallel initial snapshots, set the property to a value greater than 1. In a parallel initial snapshot, the connector processes multiple tables concurrently.

Also, tune partitions count! The more partitions the faster, here I changed from 2 to 12 partitions:

Upvotes: 0

Ilya Berdichevsky

Reputation: 1298

The best is to reconfigure connector with Avro Serialization to reduce size of the messages and track schema changes. This should give you very significant improvement.

The Avro binary format is compact and efficient. Avro schemas make it possible to ensure that each record has the correct structure. Avro’s schema evolution mechanism enables schemas to evolve. This is essential for Debezium connectors, which dynamically generate each record’s schema to match the structure of the database table that was changed.

To see difference removal of the schema from each message will make in your case, without Avro and set up of the Schema Registry, set settings below to false. Do not use it this way in production.

The default behavior is that the JSON converter includes the record’s message schema, which makes each record very verbose. If you want records to be serialized with JSON, consider setting the following connector configuration properties to false: key.converter.schemas.enable value.converter.schemas.enable. Setting these properties to false excludes the verbose schema information from each record.

Upvotes: 1

Debezium is very slow, how to improve?

Answers (2)

Related Questions