Reputation: 2776
Der all,
I am pondering a problem for some time now and while I do have a conceptual solution, I am not sure, there isn't an easier one especially if there is a library already encapsulating such a solution.
Basically, I want to send a message to Kafka iff a database transaction commits. This message must be reliably stored so that no outage (network or power) can prevent this message from being lost. No XA may be used.
Obviously, out of the blue this is hard. If the transaction to the database is committed prior to sending the message, a number of things can go wrong: power to the application-server can be lost, network connections can break down at most-inconvenient moments and so on. The same is true if the message to Kafka is sent prior to committing the database.
My solution-draft looks like this:
event
table is created in the databaseThe second process works in a journal style:
in progress
, commit to DBsent
The trick here would be to re-read messages from Kafka to recover, if a system outage occurs. All messages, which can not be read from Kafka must be considered lost and then re-sent.
No, I assume, I am not the only one who must not lose a message, so I can't imagine this is something which has to be implemented by hand. Am I right?
Upvotes: 0
Views: 2007
Reputation: 2776
After some more consulting and thinking this is some kind of XY-Question. What my question originally asked was "how do I implement synchronous semantics with Kafka" - which you shouldn't.
Putting some thought into the business-part of the application, we have not yet found a use case which actually requires the exactly-once semantics. In some cases it is acceptable to lose messages, in other cases we can live with phantom-messages.
If you really have the requirement for exactly-once you will most certainly find, that you have further requirements which scream for syncronous calls like you need an actual confirmation, the action was executed. Do yourself a favour and try not to implement that using a messaging system.
Upvotes: 0
Reputation: 304
An alternate solution would be to run a CDC kafka connect to stream the data from the database table to the topic. The CDC connectors offer exactly once deliver and you wont face duplicate issues This also deals with issues arising with 2 phase commits. The DB acts as the source of truth. The CDC keeps track on the transactional log. even if the connector stops, when restarted it will start from where it failed and eventually sync with the DB state
Upvotes: 1