Gofven
Gofven

Reputation: 59

Is SELECT "faster" than function with nested INSERT?

I'm using a function that inserts a row to a table if it doesn't exist, then returns the id of the row.

Whenever I put the function inside a SELECT statement, with values that don't exist in the table yet, e.g.:

SELECT * FROM table WHERE id = function(123);

... it returns an empty row. However, running it again with the same values will return the row with the values I want to see.

Why does this happen? Is the INSERT running behind the SELECT speed? Or does PostgreSQL cache the table when it didn't exist, and at next run, it displays the result?

Here's a ready to use example of how this issue can occur:

CREATE TABLE IF NOT EXISTS test_table(
id INTEGER,
tvalue boolean
);

CREATE OR REPLACE FUNCTION test_function(user_id INTEGER)
    RETURNS integer
    LANGUAGE 'plpgsql'
AS $$
DECLARE
    __user_id INTEGER;

BEGIN
    EXECUTE format('SELECT * FROM test_table WHERE id = $1')
    USING user_id
    INTO __user_id;

    IF __user_id IS NOT NULL THEN
        RETURN __user_id;

    ELSE
        INSERT INTO test_table(id, tvalue)
        VALUES (user_id, TRUE) 
        RETURNING id
        INTO __user_id;
        RETURN __user_id;
    END IF;
END;
$$;

Call:

SELECT * FROM test_table WHERE id = test_function(4);

To reproduce the issue, pass any integer that doesn't exist in the table, yet.

Upvotes: 1

Views: 125

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656421

The example is broken in multiple places.

  • No need for dynamic SQL with EXECUTE.
  • SELECT * in the function is wrong.
  • Your table definition should have a UNIQUE or PRIMARY KEY constraint on (id).
  • Most importantly, the final SELECT statement is bound to fail. Since the function is VOLATILE (has to be), it is evaluated once for every existing row in the table. Even if that worked, it would be a performance nightmare. But it does not. Like @user2864740 commented, there is also a problem with visibility. Postgres checks every existing row against the result of the function, which in turn adds 1 or more rows, and those rows are not yet in the snapshot the SELECT is operating on.

    SELECT * FROM test_table WHERE id = test_function(4);

This would work (but see below!):

CREATE TABLE test_table (
  id     int PRIMARY KEY  --!
, tvalue bool
);

CREATE OR REPLACE FUNCTION test_function(_user_id int)
  RETURNS test_table LANGUAGE sql AS
$func$
   WITH ins AS (
      INSERT INTO test_table(id, tvalue)
      VALUES (_user_id, TRUE) 
      ON CONFLICT DO NOTHING
      RETURNING *
      )
   TABLE ins
   UNION ALL
   SELECT * FROM test_table WHERE id = _user_id
   LIMIT 1
$func$;

And replace your SELECT with just:

SELECT * FROM test_function(1);

db<>fiddle here

Related:

There is still a race condition for concurrent calls. If that can happen, consider:

Upvotes: 1

Related Questions