Reputation: 413
I am working in a change data capture project. I have a mysql database. I use debezium to capture all the changes and send it to kafka. And later I read all the info from Spark and I send it to Apache Phoenix using jdbc.
I am using debezium with a rerouting option which send the changes of all the tables to only one kafka topic. With this configuration I am sure I can read the unique kafka topic from spark in order.
But my question is: If I use debezium without the rerouting option and I have every table changes in a different kafka topic, how could I guarantee I read the events of all the topics in the correct order?
I know I can use Spark to order it for example by timestamp but if say, one kafka topic is offline 10 minutes because a problem arise but the other kafka topics continue working I will have in Spark an ordering problem.
How can I face this problem?
Upvotes: 1
Views: 1885
Reputation: 201
I resolved this problem with this configuration on Debezium
{
"name": "name-connector",
"config": {
"plugin.name": "pgoutput",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "0.0.0.0",
"database.port": "5433",
"database.user": "postgres",
"database.password": "******",
"database.dbname" : "database",
"database.server.name": "database",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.database",
"decimal.handling.mode" : "string",
"time.precision.mode" : "connect",
"tombstones.on.delete" : false,
"transforms":"routerTopic",
"transforms.routerTopic.type":"io.debezium.transforms.ByLogicalTableRouter",
"transforms.routerTopic.topic.regex":"database.public.(.*)",
"transforms.routerTopic.topic.replacement":"database.public",
}
}
configure topic routing with transforms.routerTopic.topic.regex and transforms.routerTopic.topic.replacement
https://debezium.io/documentation/reference/0.10/configuration/topic-routing.html
Upvotes: 0