Reputation: 515
I have a system that's set up as a series of jobs
and tasks
. Each job is made up of several tasks, and there's another task_progress
table that does nothing but record the current state of each task. (This information is kept separate from the main tasks
table for business reasons that are not relevant to this issue.)
The jobs
table has an overall status
column, which needs to get updated to completed
when all of the job's tasks reach a terminal state (ok
or error
). This is handled by a trigger function:
CREATE OR REPLACE FUNCTION update_job_status_when_progress_changes()
RETURNS trigger AS $$
DECLARE
current_job_id jobs.id%TYPE;
pending integer;
BEGIN
SELECT tasks.job_id INTO current_job_id
FROM tasks WHERE tasks.id = NEW.task_id;
SELECT COUNT(*) INTO pending
FROM task_progress
JOIN tasks ON task_progress.task_id = tasks.id
WHERE tasks.job_id = current_job_id
AND task_progress.status NOT IN ('ok', 'error');
IF pending = 0 THEN
UPDATE jobs
SET status = 'completed', updated_at = NOW() AT TIME ZONE 'utc'
WHERE jobs.id = current_job_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql
CREATE TRIGGER task_progress_update_job_status
AFTER UPDATE OR DELETE
ON task_progress
FOR EACH ROW
EXECUTE PROCEDURE update_job_status_when_progress_changes()
It's been almost entirely fine. But sometimes – like, maybe once every few hundred jobs – a job will fail to flip over to completed
status. The progress rows are all correct; the business logic that displays a % complete based on the contents of the task_progress
table hits 100%, but the status of the job stays at processing
. We've been unable to reproduce it reliably; it's just something that happens now and then. But it's frustrating, and I'd like to nail it down.
There are no transactions involved; each task progress is updated atomically by the process that completes the task.
Is it possible to hit a situation where, e.g., the last two tasks in a job complete almost simultaneously, causing the trigger for Task A to see that Task B is still pending, and vice versa? I thought FOR EACH ROW
was supposed to prevent race conditions like this, but I can't explain what I'm seeing otherwise.
What's my best option here?
Upvotes: 3
Views: 741
Reputation: 247270
Yes, there is a race condition. If the last two tasks complete at about the same time, the trigger functions can run concurrently. Since the trigger runs as part of the transaction, and the transactions are both not committed yet, none of the trigger functions can see the data modifications made by the other transaction. So each believes there is still a task open.
You could use an advisory lock to make sure that that cannot happen: right before the SELECT count(*) ...
, add
SELECT pg_advisory_xact_lock(42);
That makes sure that no session will execute the query while another session that has already executed the query is still not committed, because the lock is held until the end of the transaction.
Upvotes: 3