Christian Schmitt
Christian Schmitt

Reputation: 892

job queue with multiple consumers did the same job twice

Actually a lot of things might be covered here: Job queue as SQL table with multiple consumers (PostgreSQL)

However I just wanted to ask for my specific query.

Currently I have a job queue that actually should emit a new job for every consumer, however we found out that we sometimes gotten the same job twice on different consumer (probably a race condition. This was our query (run inside a transaction):

UPDATE invoice_job SET status = 'working', date_time_start = now(), 
node = $ip 
WHERE id = (SELECT id FROM invoice_job WHERE status = 'created' ORDER BY id LIMIT 1)
RETURNING *

Currently the Table is pretty simple and has a status (can be "created", "working", "done", date_time_start field, created field (not used for query), id field, node (where the job was run).

However this emitted the same job twice at one point. Currently I changed the query now to:

UPDATE invoice_job SET status = 'working', date_time_start = now(), 
node = $ip 
WHERE id = (SELECT id FROM invoice_job WHERE status = 'created' ORDER BY id LIMIT 1 FOR UPDATE SKIP LOCKED)
RETURNING *

would that actually help and only emit the same job once?

Upvotes: 1

Views: 555

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324841

Your solution with FOR UPDATE SKIP LOCKED is fine. It'll ensure a row is locked by exactly one session before being updated for processing. No transaction can choose a row already locked by another transaction, and when the lock is released on commit, subsequent SELECT clauses will no longer match the row.

The original failed because the subquery's SELECT can choose the same row concurrently in multiple sessions, each of which then tries to UPDATE the row. There's no WHERE clause in the UPDATE that'd make that fail; it's perectly fine for two concurrent sessions to UPDATE invoice_job SET status = 'working' WHERE node = 42 or whatever. The second update will happily run and commit once the first update succeeds.

You could also make it safe by repeating the WHERE clause in the UPDATE

UPDATE invoice_job SET status = 'working', date_time_start = now(), 
node = $ip 
WHERE id = (SELECT id FROM invoice_job WHERE status = 'created' ORDER BY id LIMIT 1)
  AND status = 'created'
RETURNING *

... but this will often return zero rows under high concurrency.

In fact it will return zero rows for all but one of a set of concurrent executions, so it's no better than a serial queue worker. This is true of most of the other "clever" tricks people use to try to do concurrent queues, and one of the main reasons SKIP LOCKED was introduced.

The fact that you only noticed this problem now tells me that you would actually be fine with a simple, serial queue dispatch where you LOCK TABLE before picking the first row. But SKIP LOCKED will scale better if your workload grows.

Upvotes: 2

Related Questions