Reputation: 4393
I am trying to understand if there's a fundamental difference between what the 2 are trying to achieve. I have a use case of landing my postgres data to data lake, and these are the 2 paved-road approaches that i have.
Option 1. Create an outbox table in my database, commit to the table in the same transaction as my main tables, then a tool Ceres picks up this change (CDC) and publishes to Kafka
Option 2. Connect my postgres to a debezium connector, Debezium automatically reads my WAL and keeps on publishing the changes in my DB to data lake.
At first sight, looks like Option 2 is a neater and cleaner approach with no overheads of committing to Outbox table. Is my deduction correct? Outbox pattern looks to be the legacy pattern which could now be redundant since we can accomplish the same in a simpler, neater way using Debezium?
Upvotes: 0
Views: 2508
Reputation: 1136
Yes, Option 2 seems a neater and cleaner approach. However, the benefit of having an outbox table is, it can represent your message structure. Otherwise you may end up introducing message model in you main table or hiding message creation logic inside CDC tool. In that sense, Option 1 is cleaner! So, it depends which style you prefer.
Upvotes: 1
Reputation: 745
As you mentioned, both options will work. I think the difference lies in that the first option makes explicit the use of an Event on the domain/services: you have to create the outbox-event table, define the entities/aggregates that can be published, add the "insert event" logic to your code, etc.
The first approach seems more appropriate to use in microservices communication, where as part of the logic you want a service to publish an event, so you model this explicitly.
The second option seems more appropriate to a "data lake" needs as in your case, where you want to collect data into a data lake, but are not so much interested in modeling events.
Upvotes: 1
Reputation: 319
The Outbox pattern is a way to solve the 2-phase-commit issue. One way to realize it is using Debezium Connectors (another one would be to poll the outbox-table).
You do not need to have a Outbox pattern to use Debezium though (you can monitor your entity tables directly with a Debezium connector for example).
If you want to enable Debezium Connectors you need to enable CDC. CDC simply means Change Data Capture -> a way to capture data changes in your database.
Debezium itself has a good article about using their connectors to implement the outbox pattern: https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
Upvotes: 0