Reputation: 1885

Postgresql generic trigger function?

I am new to Postgresql, so please indulge if my question makes no sense.

I am trying to find a way to migrate my DB structure to Postgresql, in particular, I find functions quite convenient and would like to make my numerous triggers easier to write.

In my DB, I use the standard last_modified and last_modified_by fields to keep track of changes. I also use a standard primary key with incremental sequence.

There is built-in syntax to link sequences to primary key id's, but as I have to write triggers for the last_modified fields anyway, I was wondering if it was possible to have a generic function to update all at once.

Example: Table ANIMAL has fields AMIMAL_ID (primary key, with sequence SEQ_ANIMAL), fields LAST_MODIFIED and LAST_MODIFIED_BY.

Similarly, I have a table PLANT with fields PLANT_ID (primary key, with sequence SEQ_PLANT), fields LAST_MODIFIED and LAST_MODIFIED_BY again.

I would like to create a generic function to be called in the 4 triggers I need to create. I was hoping to get something like this:

Before insert function:

CREATE FUNCTION TRIGGER_BI(p_pkField text, p_Sequence text) RETURNS TRIGGER AS $$
DECLARE
        curtime timestamp := now();
BEGIN
    NEW.LAST_UPDATED := curtime;
    NEW.LAST_UPDATED_BY := current_user;
    NEW.p_pkField := nextval(p_Sequence);
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE
SECURITY DEFINER;

Before update function:

CREATE FUNCTION TRIGGER_BU() RETURNS TRIGGER AS $$
DECLARE
        curtime timestamp := now();
BEGIN
    NEW.LAST_UPDATED := curtime;
    NEW.LAST_UPDATED_BY := current_user;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE
SECURITY DEFINER;

And now, the triggers of table ANIMAL: Before Insert:

CREATE TRIGGER ANIMAL
BEFORE INSERT
    FOR EACH ROW EXECUTE PROCEDURE TRIGGER_BI("ANIMAL_ID", "SEQ_ANIMAL");

Before Update:

CREATE TRIGGER ANIMAL
    BEFORE UPDATE
        FOR EACH ROW EXECUTE PROCEDURE TRIGGER_BU();

Triggers of table PLANT: Before Insert:

CREATE TRIGGER PLANT
BEFORE INSERT
    FOR EACH ROW EXECUTE PROCEDURE TRIGGER_BI("PLANT_ID", "SEQ_PLANT");

Before Update:

CREATE TRIGGER PLANT
    BEFORE UPDATE
        FOR EACH ROW EXECUTE PROCEDURE TRIGGER_BU();

Is it possible, in any way, to get something generic this way?

YES! What is the correct syntax? BONUS: it is event possible to have a single function to do all the work, with default empty parameters that, if empty, would not update the sequence.

YES, BUT WAIT! what are the down-sides of such an approach? (performance, security, anything else to be considered)?

NO! So I really need on function for each trigger?

UPDATE: I explicitly create sequences because I may want to have shared sequences among several tables. The idea is to use the shared sequence as a unique parent table with several child tables having a foreign key on their primary key to the parent table. Don't hesitate to comment on this approach, but my basic understanding is that accessing a sequence's next value is much more efficient than having to manage foreign keys...

UPDATE 2: I found some quite interesting stuff which nearly gets me there - only that my setValue function isn't working...

Here the generic trigger:

CREATE OR REPLACE FUNCTION TRIGGER_FUNC() RETURNS TRIGGER AS $$
DECLARE
    p_pkField  text;
    p_Sequence text;
    pkValue    int;
BEGIN
    EXECUTE format('SELECT ($1).%I::int', TG_ARGV[0]) USING NEW INTO pkValue;
    p_Sequence := quote_ident(TG_ARGV[1]);
    IF pkValue IS NULL THEN
        SELECT setfieldValue(pg_typeof(NEW), TG_ARGV[0], nextval(p_Sequence));
    END IF;
    NEW.LAST_UPDATED := curtime;
    NEW.LAST_UPDATED_BY := current_user;
    RETURN NEW;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE
SECURITY DEFINER;

I found a hint to the solution of the setValue function here and tried adapting it, but it doesn't work - am I simply using a wrong call? Or can I use some additional knowledge in the method to make it simpler? (I already used the fact that I am setting a bigint value, but I might be able to do better?!)

Here the (non-working) code:

CREATE OR REPLACE FUNCTION public.setfieldValue(anyelement, text, bigint)
 RETURNS anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE 
  _name text;
  _values text[];
  _value text;
  _attnum int;
BEGIN
  FOR _name, _attnum
     IN SELECT a.attname, a.attnum
          FROM pg_catalog.pg_attribute a 
         WHERE a.attrelid = (SELECT typrelid
                               FROM pg_type
                              WHERE oid = pg_typeof($1)::oid) 
  LOOP
    IF _name = $2 THEN
      _value := $3;
    ELSE
      EXECUTE 'SELECT (($1).' || quote_ident(_name) || ')::text' INTO _value USING $1;
    END IF;
    _values[_attnum] :=  COALESCE('"' || replace(replace(_value, '"', '""'), '''', '''''') || '"', ''); 
  END LOOP;
  EXECUTE 'SELECT (' || quote_ident(pg_typeof($1)::text) || ' ''(' || array_to_string(_values,',') || ')'').*' INTO $1; 
  RETURN $1;
END;
$function$;

Upvotes: 7

Answers (2)

klin

Reputation: 121784

No for manually setting default values for primary keys. Declare your tables with primary key as serial (or bigserial) and use the built-in mechanisms to handle such columns. Do not bother about that the values of a primary key are not consecutive. Primary keys are only and exclusively for distinctly identify the row and not for anyting more.

Besides all, you cannot do this in that way because trigger functions cannot have declared arguments.

Yes for setting values of common columns in many tables. You can use the same trigger function for insert and update. Example:

CREATE OR REPLACE FUNCTION generic_trigger() -- function without arguments
RETURNS TRIGGER AS $$
BEGIN
    NEW.last_modified := now();
    NEW.last_modified_by := current_user;
    RETURN NEW;     -- important!
END;
$$
LANGUAGE 'plpgsql';

create table table_a 
    (a_id serial primary key, last_modified timestamp, last_modified_by text);
create table table_b 
    (b_id serial primary key, last_modified timestamp, last_modified_by text);

create trigger table_a_trigger
before insert or update on table_a
for each row execute procedure generic_trigger();

create trigger table_b_trigger
before insert or update on table_b
for each row execute procedure generic_trigger();

insert into table_a default values;

select * from table_a;

 a_id |      last_modified      | last_modified_by 
------+-------------------------+------------------
    1 | 2015-10-26 19:14:34.642 | postgres
(1 row)

Maybe you have really important reason to set values for primary keys in a trigger (see jpmc26's comment). In that case the primary keys should be declared as integer (bigint) without default expression and the trigger function should look like this:

create or replace function generic_trigger()
returns trigger as $$
begin
    new.last_modified := now();
    new.last_modified_by := current_user;
    if tg_op = 'INSERT' then
        if tg_table_name = 'table_a' then 
            new.a_id:= nextval('table_a_a_id_seq');
        elsif tg_table_name = 'table_b' then 
            new.b_id:= nextval('table_b_b_id_seq');
        end if;
    end if;
    return new;
end;
$$
language 'plpgsql';

insert into table_a (a_id) values (15);
select * from table_a;

 a_id |      last_modified      | last_modified_by 
------+-------------------------+------------------
    1 | 2015-10-26 19:14:34.642 | postgres
    2 | 2015-10-26 21:15:49.243 | postgres
(2 rows)

Auto-incrementing surrogate key

SERIAL would be preferable:

CREATE TABLE animal
(
    animal_id SERIAL PRIMARY KEY,
    ...
);

Note that SERIAL is really just syntax sugar. It does a few things for you to save you the hassle:

Creates a sequence
Sets the DEFAULT of animal_id to fetch the next sequence value.
Creates an "ownership" relationship between the column and the sequence. (If you drop the table, the sequence will go with it.)
Makes the column NOT NULL.

There is a concern with using SERIAL, though. If someone INSERTs with an explicit value, then that value will be used instead of generating one from the sequence:

INSERT INTO animal (animal_id, ...) VALUES(50000, ...);

This is not something to be concerned about if you have applications connecting to the database instead of real users; developers know better than to explicitly specify a value they want to be generated. If, however, you have real users logging directly into the database, this can be a concern. The fact you reference CURRENT_USER makes me think this is possibly your use case. A trigger, as you suggest, will solve that problem:

CREATE TABLE animal (animal_id INTEGER PRIMARY KEY, name TEXT NOT NULL);

CREATE SEQUENCE animal_id_seq;
ALTER SEQUENCE animal_id_seq OWNED BY animal.animal_id;

CREATE OR REPLACE FUNCTION generate_animal_id()
  RETURNS TRIGGER
  LANGUAGE plpgsql
  AS $$
  BEGIN
    IF NEW.animal_id IS NOT NULL THEN
        RAISE EXCEPTION 'Cannot specify animal_id on INSERT: %', NEW.animal_id;
    ELSE
        NEW.animal_id = NEXTVAL('animal_id_seq'::regclass);
        RETURN NEW;
    END IF;
  END
  $$
;

CREATE TRIGGER animal_insert_generate_pk
BEFORE INSERT ON animal
FOR EACH ROW
EXECUTE PROCEDURE generate_animal_id();

(Note that I threw an error instead of overwriting if they explicitly tried to set the value. It's generally better to fail fast for your user than to silently change values they specify.)

You do need a FUNCTION per table for this, unfortunately, because each one will need to use a different sequence.

Automatically updating the auditing data

Let me first address the UPDATE. Yes, you need a trigger for this. You can define a single function for all tables here, though:

CREATE OR REPLACE FUNCTION populate_last_updated_columns()
  RETURNS TRIGGER
  LANGUAGE plpgsql
  AS $$
  BEGIN
    IF 
        NEW.last_updated IS NOT NULL
        OR NEW.last_updated_by IS NOT NULL
    THEN
        RAISE EXCEPTION 'Cannot specify last_updated or last_updated_by: %, %', NEW.last_updated, NEW.last_updated_by;
    END IF;
    NEW.last_updated = CURRENT_TIMESTAMP;
    NEW.last_updated_by = CURRENT_USER;
    RETURN NEW;
  END;
  $$
;

CREATE TRIGGER update_animal
BEFORE UPDATE ON animal
FOR EACH ROW
EXECUTE PROCEDURE populate_last_updated_columns();

CREATE TRIGGER update_plant
BEFORE UPDATE ON plant
FOR EACH ROW
EXECUTE PROCEDURE populate_last_updated_columns();

That leaves INSERT. Again, this depends on if you have real users connecting to the DB and writing queries. If not, then you should leverage DEFAULT for simplicity; again, developers are smart enough to not specify these manually and to just let them get filled in:

CREATE TABLE animal
(
    ...
    last_updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    last_updated_by TEXT NOT NULL DEFAULT CURRENT_USER,
    ...
);

But if you have users connecting directly and writing their own queries, then it might make sense to use a TRIGGER to prevent them from specifying these values manually. You can reuse the same function as the UPDATE trigger:

CREATE TRIGGER animal_insert_last_updated_cols
BEFORE INSERT ON animal
FOR EACH ROW
EXECUTE PROCEDURE populate_last_updated_columns();

Note that PostgreSQL allows you to define multiple triggers for the same event. (They get executed in alphabetical order.) In this case, I would prefer to go ahead and keep the triggers separate. Since these two triggers cannot possibly affect the same data on a given row, this would be easier to maintain. It reduces the amount of code and it means that changes to the populate_last_updated_columns function will update all triggers with minimum hassle. I would not be concerned about the performance of multiple triggers unless you expect dozens of INSERTs per second. If so, then benchmark and see if the performance impact is a problem for you.

Upvotes: 2

Postgresql generic trigger function?

Answers (2)

Auto-incrementing surrogate key

Automatically updating the auditing data

Related Questions