mbklein
mbklein

Reputation: 515

Postgresql trigger function occasionally fails to update its target correctly – possible race condition?

I have a system that's set up as a series of jobs and tasks. Each job is made up of several tasks, and there's another task_progress table that does nothing but record the current state of each task. (This information is kept separate from the main tasks table for business reasons that are not relevant to this issue.)

The jobs table has an overall status column, which needs to get updated to completed when all of the job's tasks reach a terminal state (ok or error). This is handled by a trigger function:

CREATE OR REPLACE FUNCTION update_job_status_when_progress_changes()
  RETURNS trigger AS $$
DECLARE
  current_job_id jobs.id%TYPE;
  pending integer;
BEGIN
  SELECT tasks.job_id INTO current_job_id
  FROM tasks WHERE tasks.id = NEW.task_id;

  SELECT COUNT(*) INTO pending
  FROM task_progress
  JOIN tasks ON task_progress.task_id = tasks.id
  WHERE tasks.job_id = current_job_id
  AND task_progress.status NOT IN ('ok', 'error');

  IF pending = 0 THEN
    UPDATE jobs
    SET status = 'completed', updated_at = NOW() AT TIME ZONE 'utc'
    WHERE jobs.id = current_job_id;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql

CREATE TRIGGER task_progress_update_job_status
  AFTER UPDATE OR DELETE
  ON task_progress
  FOR EACH ROW
  EXECUTE PROCEDURE update_job_status_when_progress_changes()

It's been almost entirely fine. But sometimes – like, maybe once every few hundred jobs – a job will fail to flip over to completed status. The progress rows are all correct; the business logic that displays a % complete based on the contents of the task_progress table hits 100%, but the status of the job stays at processing. We've been unable to reproduce it reliably; it's just something that happens now and then. But it's frustrating, and I'd like to nail it down.

There are no transactions involved; each task progress is updated atomically by the process that completes the task.

Is it possible to hit a situation where, e.g., the last two tasks in a job complete almost simultaneously, causing the trigger for Task A to see that Task B is still pending, and vice versa? I thought FOR EACH ROW was supposed to prevent race conditions like this, but I can't explain what I'm seeing otherwise.

What's my best option here?

Upvotes: 3

Views: 741

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247270

Yes, there is a race condition. If the last two tasks complete at about the same time, the trigger functions can run concurrently. Since the trigger runs as part of the transaction, and the transactions are both not committed yet, none of the trigger functions can see the data modifications made by the other transaction. So each believes there is still a task open.

You could use an advisory lock to make sure that that cannot happen: right before the SELECT count(*) ..., add

SELECT pg_advisory_xact_lock(42);

That makes sure that no session will execute the query while another session that has already executed the query is still not committed, because the lock is held until the end of the transaction.

Upvotes: 3

Related Questions