Reputation: 377
my idea would be to use the Debezium embedded connector with Spring Boot and connect to a SQL Server database. The only issue I see is how to keep track of the offset, ideally without using a database. Are there any examples, or has anyone faced a similar problem? The connection with Debezium is something like this:
public String buildDebeziumConnectorString() {
StringBuilder sb = new StringBuilder(SQLSERVER_CONNECTOR_NAME);
sb.append("?databaseHostName=").append(properties.getDatabaseHostName())
.append("&databasePort=").append(properties.getDatabasePort())
.append("&databaseUser=").append(properties.getDatabaseUser())
.append("&databasePassword=").append(properties.getDatabasePassword())
.append("&databaseServerName=").append(properties.getDatabaseServerName())
.append("&includeSchemaChanges=").append(properties.isIncludeSchemaChanges())
.append("&databaseDbname=").append(properties.getDatabaseDbname())
.append("&tableWhitelist=").append(properties.getTableWhitelist())
.append("&offsetStorageFileName=").append(properties.getOffsetStorageFileName())
.append("&databaseHistoryFileFilename=").append(properties.getDatabaseHistoryFileFilename());
return sb.toString();}
In particular, ("&offsetStorageFileName=") is the part that concerns me the most.
I am in a distributed environment and since I'm already using Kafka, I'd like to use it to keep track of offsets. Do you know if it's possible to do this with the Debezium embedded connector?
Upvotes: 1
Views: 636
Reputation: 191864
You are using Debezium, so you already "have a database"... (the one you are pulling from).
In general, you wouldn't use standalone/embedded mode for this very reason of managing your own offsets in a stateful way.
Instead, you would run Kafka Connect / Debezium Server in "distributed mode", then Kafka is responsible for storing the configs and offsets.
In other words, using files assumes you are running on only one instance (since that file wouldn't be distributed to other consumer processes, for example, in a highly available deployment scenario) on static servers, not in ephemeral / stateless ones
So, since you appear to be using Apache Camel, did you notice the filename parameter isn't required and there's instead offsetStorage
that's Kafka based and other settings for offsetTopic
, its replication factor, etc? Similarly, the database history also should use a topic. The same can be found in the Debezium docs
in a distributed environment and since I'm already using Kafka
Kafka can be used in a "non-distributed environment" as well
Upvotes: 0