Reputation: 97
Description
I have a training
table in PostgreSQL:
CREATE TABLE training
(
id SERIAL PRIMARY KEY,
person_id INT NOT NULL,
training_type_id INT NOT NULL,
date_expires DATE NOT NULL,
valid BOOLEAN NOT NULL DEFAULT FALSE
);
Goal
date_expires
for the same person_id
and training_type_id
.valid = TRUE
, and others as valid = FALSE
.Current solution
CREATE OR REPLACE FUNCTION update_valid_status()
RETURNS TRIGGER AS $$
BEGIN
IF pg_trigger_depth() <> 1 THEN RETURN NEW; END IF;
UPDATE training
SET valid = FALSE
WHERE person_id = NEW.person_id
AND training_type_id = NEW.training_type_id;
UPDATE training
SET valid = TRUE
WHERE ctid = (
SELECT ctid
FROM training
WHERE person_id = NEW.person_id
AND training_type_id = NEW.training_type_id
ORDER BY date_expires DESC
LIMIT 1
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_update_valid_status
AFTER INSERT OR UPDATE ON training
FOR EACH ROW
EXECUTE FUNCTION update_valid_status();
Questions
Note
Yes, creating a view would be the optimal solution but unfortunately, directus (which is the db management) doesn't support views yet so I need to implement this hacky solution for now.
Thank you!
Upvotes: 0
Views: 30
Reputation: 26347
- Is this a good approach for preventing recursion and ensuring performance?
I'll address performance in 3. To prevent recursion, you could also add a trigger..when(condition)
:
CREATE TRIGGER trg_update_valid_status
AFTER INSERT ON training
FOR EACH ROW
EXECUTE FUNCTION update_valid_status();
CREATE TRIGGER trg_update_valid_status
AFTER UPDATE ON training
FOR EACH ROW
WHEN ((new.date_expires, new.person_id, new.training_type_id)
is distinct from
(old.date_expires, old.person_id, old.training_type_id))
EXECUTE FUNCTION update_valid_status();
This prevents the trigger from firing on that update
issued from inside the trigger but limiting the trigger depth works too. The when
would prevent it from firing entirely, while pg_trigger_depth() <> 1
halts it one step later - note that it's one step times however many rows are being affected. It also won't fire at all on update
s that change columns unrelated to what's valid
.
It's worth pointing out you didn't handle the delete
event which introduces a window of time when you don't have a valid
at all, because it got deleted, and no insert/update happened for that person and type to set a new one.
- How to handle concurrency issues with this setup?
Add a partial unique index:
create unique index on training(person_id,training_type_id)where(valid);
That will prevent multiple clients setting different rows as valid for the same person and training type whenever they happen to run concurrently targeting the same subset of rows. Only a single record can be valid
at a time for any given person and training type.
Also, lock your rows explicitly in a before
trigger. Note that since you're in PL/pgSQL and you don't need to save those rows into anything, you need to perform
rather than select
:
perform from training
where person_id = NEW.person_id
and training_type_id = NEW.training_type_id
for update;--locked exclusively until the end of transaction
Two clients operating concurrently could both compare their payload to what's in the table in their AFTER
perspective and both conclude their new row is the newest one. The index stops them from setting them both as the current valid
, but it doesn't guarantee they'll do their set/unset in the right order, only that at most one is set at a time.
The explicit lock tells concurrent clients trying to affect the same person and training type combination, to queue up, wait, then proceed in sequence, one by one.
- Any optimizations for large datasets?
Add AND valid
:
UPDATE training
SET valid = FALSE
WHERE person_id = NEW.person_id
AND training_type_id = NEW.training_type_id
AND valid;
And you're saving a rewrite of all rows for this person and type. Overwriting a value with the same value isn't a no-op, it still has to get to the page, mark the old tuple for deletion and write a new one, on the same page or another. The addition of AND valid
lets it unset the old and set the new valid
row without troubling any innocent bystander rows.
A much larger improvement would be keeping the one valid training per person_id
and training_type_id
in a separate table (or even just an FK to it) and let the trigger only check if the incoming row has higher date_expires
, swap it out if needed. If you really need to have the valid
in your main training
table, let the trigger unset the old valid and set the new one when doing the swap, still based on that small, separate table. That saves you:
training
tableupdate
changing false
to false
mentioned earlier. Instead, you can either not write anything to training
from this trigger if you keep valid
column on the tiny, separate table (also prevents trigger self-fire completely).count(*)
. If you make the small table pull a double duty as a tally table by letting the trigger increase/decrease a count
column, you can sum(count)
from the tally table whenever you need to get the training
table count(*)
. Or a count per person, per training type, per combination of both.Another thing that would make it simpler, cleaner and faster is a statement-level trigger. An insert
of 100 rows fires a row-level trigger 100 times, while a statement-level trigger always fires just once per statement, regardless of the size of your payload.
Upvotes: 1