grolegor
grolegor

Reputation: 1330

Managing rapidly growing "outbox" table when implementing "Transactional Outbox Pattern"

My service needs to publish messages as part of a transaction that updates the database. I am using Postgres and Kafka, and I have decided to employ the Transactional Outbox pattern: https://microservices.io/patterns/data/transactional-outbox.html

The main idea behind the Transactional Outbox pattern is to write the message as part of the same transaction that updates the application's primary data source. This ensures atomicity and consistency by guaranteeing that both the data modification and the message creation either succeed together or fail together.

Within the database transaction that involves creating, updating, and deleting business objects, the service sends messages by inserting them into an outbox table. This insertion is performed within a local ACID transaction, ensuring atomicity. The outbox table serves as a temporary message queue.

In addition, I am utilizing Debezium to read the Postgres Write-Ahead Log (WAL) for the outbox table and publish the messages to Kafka.

However, I am currently facing a problem with the rapid growth of my outbox table. It has become crucial for me to find a reliable solution to delete events that have already been sent.

Do you have any ideas on how to solve this problem?

I have come up with several possible approaches:

  1. Deleting old data based on timestamps. However, this approach is not entirely reliable as some old data might not have been sent to Kafka.
  2. Subscribing to the Kafka topic from the same application and deleting data. In this approach, we can only read data that has already been sent to Kafka. We can retrieve the message IDs from the topic and delete the corresponding rows from the outbox table. To avoid excessive delete transactions to Postgres, it would be advisable to handle messages in large batches.
  3. Using the same logic as in the second approach but creating a separate microservice specifically for deleting data from the outbox table. This would allow us to scale this solution independently from the main service.

Upvotes: 2

Views: 1265

Answers (1)

Jon Senchyna
Jon Senchyna

Reputation: 8047

How are you processing messages in the outbox table?

If you are reading from the Write-Ahead Log (WAL), you can delete the messages as soon as you insert them, as the insertion entry in the WAL will still be there. This is a common practice when implementing this pattern.

You can even take it a step further and write directly to the WAL without even writing to a DB table using pg_logical-emit-message().

Upvotes: 1

Related Questions