pepeday
pepeday

Reputation: 97

Trigger to mark latest training record as valid without recursive execution

Description

I have a training table in PostgreSQL:

CREATE TABLE training  
(
    id SERIAL PRIMARY KEY,
    person_id INT NOT NULL,
    training_type_id INT NOT NULL,
    date_expires DATE NOT NULL,
    valid BOOLEAN NOT NULL DEFAULT FALSE
);

Goal

  1. When a record is inserted or updated:
    • Find the latest date_expires for the same person_id and training_type_id.
    • Mark it as valid = TRUE, and others as valid = FALSE.
  2. Prevent recursive trigger execution to avoid infinite loops.

Current solution

CREATE OR REPLACE FUNCTION update_valid_status()
RETURNS TRIGGER AS $$
BEGIN
  IF pg_trigger_depth() <> 1 THEN RETURN NEW; END IF;

  UPDATE training
  SET valid = FALSE
  WHERE person_id = NEW.person_id
    AND training_type_id = NEW.training_type_id;

  UPDATE training
  SET valid = TRUE
  WHERE ctid = (
    SELECT ctid
    FROM training
    WHERE person_id = NEW.person_id
      AND training_type_id = NEW.training_type_id
    ORDER BY date_expires DESC
    LIMIT 1
  );

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_update_valid_status
AFTER INSERT OR UPDATE ON training
FOR EACH ROW
EXECUTE FUNCTION update_valid_status();

Questions

  1. Is this a good approach for preventing recursion and ensuring performance?
  2. How to handle concurrency issues with this setup?
  3. Any optimizations for large datasets?

Note

Yes, creating a view would be the optimal solution but unfortunately, directus (which is the db management) doesn't support views yet so I need to implement this hacky solution for now.

Thank you!

Upvotes: 0

Views: 30

Answers (1)

Zegarek
Zegarek

Reputation: 26347

  1. Is this a good approach for preventing recursion and ensuring performance?

I'll address performance in 3. To prevent recursion, you could also add a trigger..when(condition):

CREATE TRIGGER trg_update_valid_status
AFTER INSERT ON training
FOR EACH ROW
EXECUTE FUNCTION update_valid_status();

CREATE TRIGGER trg_update_valid_status
AFTER UPDATE ON training
FOR EACH ROW
WHEN ((new.date_expires, new.person_id, new.training_type_id)
      is distinct from 
      (old.date_expires, old.person_id, old.training_type_id))
EXECUTE FUNCTION update_valid_status();

This prevents the trigger from firing on that update issued from inside the trigger but limiting the trigger depth works too. The when would prevent it from firing entirely, while pg_trigger_depth() <> 1 halts it one step later - note that it's one step times however many rows are being affected. It also won't fire at all on updates that change columns unrelated to what's valid.

It's worth pointing out you didn't handle the delete event which introduces a window of time when you don't have a valid at all, because it got deleted, and no insert/update happened for that person and type to set a new one.


  1. How to handle concurrency issues with this setup?

Add a partial unique index:

create unique index on training(person_id,training_type_id)where(valid);

That will prevent multiple clients setting different rows as valid for the same person and training type whenever they happen to run concurrently targeting the same subset of rows. Only a single record can be valid at a time for any given person and training type.

Also, lock your rows explicitly in a before trigger. Note that since you're in PL/pgSQL and you don't need to save those rows into anything, you need to perform rather than select:

perform from training 
where person_id = NEW.person_id
  and training_type_id = NEW.training_type_id
for update;--locked exclusively until the end of transaction

Two clients operating concurrently could both compare their payload to what's in the table in their AFTER perspective and both conclude their new row is the newest one. The index stops them from setting them both as the current valid, but it doesn't guarantee they'll do their set/unset in the right order, only that at most one is set at a time.

The explicit lock tells concurrent clients trying to affect the same person and training type combination, to queue up, wait, then proceed in sequence, one by one.


  1. Any optimizations for large datasets?

Add AND valid:

UPDATE training
SET valid = FALSE
WHERE person_id = NEW.person_id
  AND training_type_id = NEW.training_type_id
  AND valid;

And you're saving a rewrite of all rows for this person and type. Overwriting a value with the same value isn't a no-op, it still has to get to the page, mark the old tuple for deletion and write a new one, on the same page or another. The addition of AND valid lets it unset the old and set the new valid row without troubling any innocent bystander rows.

A much larger improvement would be keeping the one valid training per person_id and training_type_id in a separate table (or even just an FK to it) and let the trigger only check if the incoming row has higher date_expires, swap it out if needed. If you really need to have the valid in your main training table, let the trigger unset the old valid and set the new one when doing the swap, still based on that small, separate table. That saves you:

  1. A ton of needless re-scans of the training table
  2. I/O: the update changing false to false mentioned earlier. Instead, you can either not write anything to training from this trigger if you keep valid column on the tiny, separate table (also prevents trigger self-fire completely).
  3. Time it takes to find and lock a row - you're pinpointing it in a tiny table almost immediately.
  4. Time it takes to count(*). If you make the small table pull a double duty as a tally table by letting the trigger increase/decrease a count column, you can sum(count) from the tally table whenever you need to get the training table count(*). Or a count per person, per training type, per combination of both.

Another thing that would make it simpler, cleaner and faster is a statement-level trigger. An insert of 100 rows fires a row-level trigger 100 times, while a statement-level trigger always fires just once per statement, regardless of the size of your payload.

Upvotes: 1

Related Questions