A H
A H

Reputation: 23

Capture number of rows affected by dynamic sql?

I am trying to get the return from a QUERY EXEUTE in a plpgsql function to be able to check how many rows were affected from a dynamic update query. My use case is adding an event (with a custom payload) to a separate table on insert or update to a dynamically set table. Because my event has a custom payload, I have not been able to use a database trigger (e.g. trigger before insert). As a simplified example, assume I have this table:

CREATE TABLE users (user_id text primary key, name text)

Here is my simplified events table:

CREATE TABLE events(event_id text primary key, payload json)

Here is my simplified function:

CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
    RETURNS SETOF anyelement AS
$func$
DECLARE
    event_id text;
BEGIN
    SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
    execute format('insert into event(event_id, payload) values ($1, $2)') using event_id, payload;
    RETURN QUERY EXECUTE format('%s', q);
END
$func$ LANGUAGE plpgsql;

The goal is to have this work exactly the same as if someone had created these in a transaction. In pseucode for insert:

BEGIN
insert into events(id, payload) values($1, $2)
insert into users(columns) values(<any values>)
COMMIT

and similarly for update:

BEGIN
insert into events(id, payload) values($1, $2)
result, error := query(`update users set name = 'hello' where id = 'Not Exists Thus No Rows Modified'`);
if result.rowsAffected() == 0 {
   ROLLBACK
}
COMMIT

The function my_function almost works except for one edge case: when an update actually doesn't affect any rows. For example, this works:

select * from my_function(NULL::users, 
       'insert into users(id,name) values('u1', ''a2'') returning *',
       payload => '{"id": "e1", "custom": "s1", "field": "2019-10-12T07:20:50.52Z"}')      

As expected, after this is done both a row in the users table and the events table is created. What fails is the following:

select * from my_function(NULL::users, 
       'update users set name = ''hello'' where user_id = ''NotExists'' returning *',
       payload => '{"id": "e2", "custom": "s3", "field": "2019-10-12T07:20:50.52Z"}')     

Here, a row is created in the events table (my goal is that it should not be created). I know this approach is not elegant, and I know this is vulnerable to SQL injection. I'd love suggestions on better ways to solve this (including scrapping what we're doing now). But to answer the question directly, I'm looking to store the result of QUERY EXECUTE, check if any rows were affected, and raise an error so that there is never a case where a row in the events table is created when there is not real corresponding change in the users table. Users table is just an example, in general, it could be any dynamically set table.

Upvotes: 1

Views: 964

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658462

CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
  RETURNS SETOF anyelement
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE q;

   IF NOT FOUND THEN 
      RETURN;  -- nothing happened yet, we can exit silently.
      -- Or you WANT an error for this case. Then do this instead:
      -- RAISE EXCEPTION 'Query passed in parameter "q" did not affect any rows. Doing nothing!';
   END IF;
   
   INSERT INTO event(event_id, payload)
   VALUES (payload->>'id', payload);
END
$func$;

As has been commented, RETURN QUERY does not return from the function. The manual:

RETURN NEXT and RETURN QUERY do not actually return from the function — they simply append zero or more rows to the function's result set. Execution then continues with the next statement in the PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY commands are executed, the result set is built up. A final RETURN, which should have no argument, causes control to exit the function (or you can just let control reach the end of the function).

There's a code example for your case exactly at the bottom of that chapter in the manual. From me, actually. Originating here:

It was suggested to use GET DIAGNOSTICS instead of the simpler FOUND. It's true that EXECUTE does not set the state of FOUND. But RETURN QUERY does. So keep using the simpler FOUND. Related:

You have format() in your original twice. And while that's typically very useful for dynamic SQL, it's useless in your case. EXECUTE format('%s', q) is exactly the same as just EXECUTE q, with added cost. Both are open doors for SQL injection when passing user input.

While there is a good chance that the transaction might be rolled back, start with the critical step, and do the rest later. Avoid wasting the work. So I moved executing q to the top. Assuming it does not depend on the "payload" row, now inserted later.

Also, INSERT INTO events can be plain SQL. Nothing dynamic there. No need for format() or EXECUTE.

Finally, assuming your jsonb_object_field_text (payload, 'id')::text is just a fancy way of saying payload->>'id'. No need for an additional variable and another SELECT INTO.

Warning against SQL injection

Converting user input (parameter q in the example) to code to execute dynamically is the most direct SQL injection vulnerability of all. I wouldn't want to be caught in my underwear doing that.

Upvotes: 0

Jaime Casanova
Jaime Casanova

Reputation: 66

A RETURN QUERY doesn't need to go to the end of the function, it only says: "the results of this query are part of the resulting set".

So you can use the RETURN QUERY, ask for FOUND and act accordingly. Here is your function modified for working this way:

CREATE OR REPLACE FUNCTION public.my_function(_rowtype anyelement, q text, payload jsonb)
 RETURNS SETOF anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE
    event_id text;
BEGIN
    SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;

    RETURN QUERY EXECUTE format('%s', q);
    IF FOUND THEN
        execute format('insert into events(event_id, payload) values ($1, $2)') using event_id, payload;
    END IF;

    RETURN;
END
$function$

PD: Maybe you can also solve your problem with triggers FOR EACH STATEMENT using the transition tables OLD and NEW (which are available since v10, https://www.postgresql.org/docs/10/sql-createtrigger.html)

Upvotes: 1

Related Questions