Dr. Pro
Dr. Pro

Reputation: 41

Debezium with SQL Server take snapshot from actual table as well as capture table

I have implemented the outbox pattern using Debezium and SQL Server by closing following this guide: Reliable Microservices Data Exchange With the Outbox Pattern.

As per the guide, I'm doing an "insert + delete" of the outbox event in the same transaction in my application. This means that at the end of the transaction, the outbox table remains effectively empty. The transaction log is aware of the insert and delete operations so that information is not lost. SQL Server's CDC capture job then "tails" this transaction log and moves those records to a capture table. I've kept the retention period of this capture table as 3 days (default value).

When I start a Debezium connector, it first takes a snapshot of the outbox table and then moves on to capturing the events of the capture table as and when they come. The problem with this is that my outbox table is effectively empty so the snapshot doesn't result in anything. But there might be a few events in the capture table which were created before the connector was started.

Is there a way to configure Debezium to do a full snapshot of the capture table in addition to the outbox table? Or I have missed something in setting up the outbox pattern to avoid running into this issue?

I like the existing setup because it means I don't have to do any housekeeping to keep the outbox table to a suitable size. I'm aware that instead of doing an "insert + delete" in my application, I can only do an "insert" so that the outbox table is populated. Then I can have a stored procedure/external script which cleans up the records older than some time (3 days for example) from the outbox table. I am trying to avoid this overhead of having extra housekeeping process.

Upvotes: 0

Views: 493

Answers (1)

Naros
Naros

Reputation: 21133

The main issue here is more aligned with the "chicken & egg" paradigm.

If the connector is new, it's going to look at the transaction log and record the current position, perform any type of snapshot steps you've configured the connector to do, and only then will it begin streaming, but it does so from the current position we obtained at the start.

For the outbox pattern, if you want to maintain the idea of doing an insert + delete to miniimize the overhead of bookkeeping on the outbox table, you will simply want to make sure that you start the connector first, let it perform any initial snapshot tasks you've configured & record offsets. Once this is done, its then safe for you to then manipulate records in the outbox table and they'll be picked up and streamed as you had hoped.

Upvotes: 0

Related Questions