Romil Punetha
Romil Punetha

Reputation: 81

Comparing CDC vs Outbox Pattern for creating event streams

I'm working on creating event streams using Outbox pattern. I would like to know why would one go for Outbox pattern instead of using CDC on the required tables?

Pros of using CDC directly:

  1. The streams will always be in order because it does not matter when one introduced event capturing as the connector takes a snapshot of all the existing data and starts capturing events henceforth.
  2. It does not require application changes. Application can continue to work as is without any code changes.

Cons:

  1. Need to parse the db event manually(or using some existing parser class like the one available for outbox events).
  2. Does not filter out unnecessary events. Eg. if a record changes 100 times, but only the initial and final state is required, still all 100 events will be emitted. Selective writing to outbox alleviates this problem.

On further reading, one point that came up was that it separates db design from message contract. However, the downside that's bothering me is that outbox works from the day the code goes live. For all previous events, they need to be replayed and ingested into the outbox, which breaks the order of the stream as older events will be portrayed as latest events in the outbox, something one doesn't have to worry about when using CDC directly.

Any insights on what the efficient approach here is?

Upvotes: 7

Views: 2637

Answers (1)

Gerard Garcia
Gerard Garcia

Reputation: 1856

Personally, my biggest concern would be coupling the DB schema to the stream message format. Although, it may depend on if you are building an event-based architecture or implementing a CQRS pattern with an event-sourcing pipeline.

One of the points of having an event-based architecture is being able to have all services decoupled by properly modeling the events. You lose that with a CDC connector since you have almost no control on how the events will look like.

On the other side, if you are using the event stream to build an optimized version of the same data (the query path in a CQRS pattern), it may be fine to just plug a CDC connector to a table (e.g. a CDC connector in a 'blog' table to build an Elasticsearch index to power the site search). Even if it exposes the table schema to the stream.

Since it seems that you are also worried about coupling the db schema to the message format and you want to easily have all the table contents in the stream, you could use a hybrid approach. Use a CDC connector to get the events, but post-process them so you can filter fields, transform them and so on so the messages that the downstream services end up using are not coupled to the db schema.

Upvotes: 7

Related Questions