Reputation: 1707
Given tables table_a
and table_b
, I need to reliably (and concurrently) perform an operation like this whenever table_a
is updated:
SELECT
some rows from table_a
.UPDATE
a row in table_b
.I need to guard against something like this happening (call this scenario A), where table_b
winds up reflecting an old version of table_a
:
worker1
gets rows from table_a
.table_a
is updated.worker2
gets rows from table_a
.worker2
updates table_b
.worker1
updates table_b
.But this is fine (call this scenario B) because table_b
winds up in the right state:
worker1
gets rows from table_a
.table_a
is updated.worker2
gets rows from table_a
.worker1
updates table_b
.worker2
updates table_b
.One solution is to wrap the whole thing in a REPEATABLE READ
transaction. Then worker1
's transaction fails in scenario A and worker2
's transaction fails in scenario B. Without a way to distinguish the scenarios, the only choice is to retry the failed transactions.
But it's wasteful in both scenarios: In scenario A, we'd rather not retry worker1
's transaction because table_b
is already fully updated. And in scenario B, we'd rather not fail worker2
's transaction in the first place because it was doing the right thing.
If we know the primary key of the row in table_b
from the start (:b_id
) and each worker has some unique ID (:worker_id
), we can try something else. Add a mark
column to table_b
and let each worker do this before step 1:
UPDATE table_b SET mark = :worker_id WHERE id = :b_id;
Then in step 3 add a WHERE
clause:
UPDATE table_b SET ... WHERE ... AND mark = :worker_id;
Now worker1
updates no rows in step 3 in both scenarios, as desired.
Is row marking a reasonable approach here? What drawbacks am I missing? What's the "canonical" solution to this problem?
Clarification: I'm using PostgreSQL.
Upvotes: 1
Views: 203
Reputation: 26464
Use transactions and SELECT ... FOR UPDATE on table b.
This will cause row locking which will prevent worker two from updating table b until worker 1 commits. If you start your workflow with the select for update, then worker 2 cannot start until worker 1 commits.
A second approach (probably better) would be to wrap the update in a statement that would do the select so it is a single statement, and would handle the locking automatically.
Row marking needs to be discarded as an idea because worker two won't see the marked row until worker 1 commits....
Upvotes: 2