Mike Craig
Mike Craig

Reputation: 1707

Transactions vs row marking for reliable SELECT + UPDATE

The problem:

Given tables table_a and table_b, I need to reliably (and concurrently) perform an operation like this whenever table_a is updated:

  1. SELECT some rows from table_a.
  2. Compute something in application code.
  3. UPDATE a row in table_b.

I need to guard against something like this happening (call this scenario A), where table_b winds up reflecting an old version of table_a:

But this is fine (call this scenario B) because table_b winds up in the right state:

Transactions:

One solution is to wrap the whole thing in a REPEATABLE READ transaction. Then worker1's transaction fails in scenario A and worker2's transaction fails in scenario B. Without a way to distinguish the scenarios, the only choice is to retry the failed transactions.

But it's wasteful in both scenarios: In scenario A, we'd rather not retry worker1's transaction because table_b is already fully updated. And in scenario B, we'd rather not fail worker2's transaction in the first place because it was doing the right thing.

Row marking:

If we know the primary key of the row in table_b from the start (:b_id) and each worker has some unique ID (:worker_id), we can try something else. Add a mark column to table_b and let each worker do this before step 1:

UPDATE table_b SET mark = :worker_id WHERE id = :b_id;

Then in step 3 add a WHERE clause:

UPDATE table_b SET ... WHERE ... AND mark = :worker_id;

Now worker1 updates no rows in step 3 in both scenarios, as desired.

Is row marking a reasonable approach here? What drawbacks am I missing? What's the "canonical" solution to this problem?

Clarification: I'm using PostgreSQL.

Upvotes: 1

Views: 203

Answers (1)

Chris Travers
Chris Travers

Reputation: 26464

Use transactions and SELECT ... FOR UPDATE on table b.

This will cause row locking which will prevent worker two from updating table b until worker 1 commits. If you start your workflow with the select for update, then worker 2 cannot start until worker 1 commits.

A second approach (probably better) would be to wrap the update in a statement that would do the select so it is a single statement, and would handle the locking automatically.

Row marking needs to be discarded as an idea because worker two won't see the marked row until worker 1 commits....

Upvotes: 2

Related Questions