Transactions vs row marking for reliable SELECT + UPDATE

Question

The problem:

Given tables table_a and table_b, I need to reliably (and concurrently) perform an operation like this whenever table_a is updated:

SELECT some rows from table_a.
Compute something in application code.
UPDATE a row in table_b.

I need to guard against something like this happening (call this scenario A), where table_b winds up reflecting an old version of table_a:

worker1 gets rows from table_a.
table_a is updated.
worker2 gets rows from table_a.
worker2 updates table_b.
worker1 updates table_b.

But this is fine (call this scenario B) because table_b winds up in the right state:

worker1 gets rows from table_a.
table_a is updated.
worker2 gets rows from table_a.
worker1 updates table_b.
worker2 updates table_b.

Transactions:

One solution is to wrap the whole thing in a REPEATABLE READ transaction. Then worker1's transaction fails in scenario A and worker2's transaction fails in scenario B. Without a way to distinguish the scenarios, the only choice is to retry the failed transactions.

But it's wasteful in both scenarios: In scenario A, we'd rather not retry worker1's transaction because table_b is already fully updated. And in scenario B, we'd rather not fail worker2's transaction in the first place because it was doing the right thing.

Row marking:

If we know the primary key of the row in table_b from the start (:b_id) and each worker has some unique ID (:worker_id), we can try something else. Add a mark column to table_b and let each worker do this before step 1:

UPDATE table_b SET mark = :worker_id WHERE id = :b_id;

Then in step 3 add a WHERE clause:

UPDATE table_b SET ... WHERE ... AND mark = :worker_id;

Now worker1 updates no rows in step 3 in both scenarios, as desired.

Is row marking a reasonable approach here? What drawbacks am I missing? What's the "canonical" solution to this problem?

Clarification: I'm using PostgreSQL.

Chris Travers · Accepted Answer

Use transactions and SELECT ... FOR UPDATE on table b.

This will cause row locking which will prevent worker two from updating table b until worker 1 commits. If you start your workflow with the select for update, then worker 2 cannot start until worker 1 commits.

A second approach (probably better) would be to wrap the update in a statement that would do the select so it is a single statement, and would handle the locking automatically.

Row marking needs to be discarded as an idea because worker two won't see the marked row until worker 1 commits....

Transactions vs row marking for reliable SELECT + UPDATE

The problem:

Transactions:

Row marking:

Answers (1)

Related Questions