Check if table is empty in runtime

I am trying to write a script which drops some obsolete tables in Postgres database. I want to be sure the tables are empty before dropping them. I also want the script could be kept in our migration scripts where it is safe to run even after these tables are actually dropped.

There is my script:

CREATE OR REPLACE FUNCTION __execute(TEXT) RETURNS VOID AS $$
  BEGIN EXECUTE $1; END;
$$ LANGUAGE plpgsql STRICT;

CREATE OR REPLACE FUNCTION __table_exists(TEXT, TEXT) RETURNS bool as $$
SELECT exists(SELECT 1 FROM information_schema.tables WHERE (table_schema, table_name, table_type) = ($1, $2, 'BASE TABLE'));
$$ language sql STRICT;

CREATE OR REPLACE FUNCTION __table_is_empty(TEXT) RETURNS bool as $$
  SELECT not exists(SELECT 1 FROM  $1 ); 
$$ language sql STRICT;

-- Start migration here

SELECT __execute($$
   DROP TABLE oldtable1;
$$)
WHERE __table_exists('public', 'oldtable1')
 AND __table_is_empty('oldtable1');

-- drop auxilary functions here

And finally I got:

ERROR:  syntax error at or near "$1"
LINE 11:   SELECT not exists(SELECT 1 FROM  $1 );

Is there any other way?

Upvotes: 3

Answers (3)

Erwin Brandstetter

Reputation: 657982

No, no, no. For many reasons.

@kordirko already pointed out the immediate cause for the error message: In plain SQL, variables can only be used for values not for key words or identifiers. You can fix that with dynamic SQL, but that still doesn't make your code right.

You are applying programming paradigms from other programming languages. With PL/pgSQL, it is extremely inefficient to split your code into multiple separate tiny sub-functions. The overhead is huge in comparison.

Your actual call is also a time bomb. Expressions in the WHERE clause are executed in any order, so this may or may not raise an exception for non-existing table names:

WHERE __table_exists('public', 'oldtable1')
 AND __table_is_empty('oldtable1');

... which will roll back your whole transaction.

Finally, you are completely open to race conditions. Like @Frank already commented, a table can be in use by concurrent transactions, in which case open locks may stall your attempt to drop the table. Could also lead to deadlocks (which the system resolves by rolling back all but one competing transactions). Take out an exclusive lock yourself, before you check whether the table is (still) empty.

Proper function

This is safe for concurrent use. It takes an array of table names (and optionally a schema name) and only drops existing, empty tables that are not locked in any way:

CREATE OR REPLACE FUNCTION f_drop_tables(_tbls text[] = '{}'
                                       , _schema text = 'public'
                                       , OUT drop_ct int)  AS
$func$
DECLARE
   _tbl   text;                             -- loop var
   _empty bool;                             -- for empty check
BEGIN
   drop_ct := 0;                            -- init!
   FOR _tbl IN 
      SELECT quote_ident(table_schema) || '.'
          || quote_ident(table_name)        -- qualified & escaped table name
      FROM   information_schema.tables
      WHERE  table_schema = _schema
      AND    table_type   = 'BASE TABLE'
      AND    table_name   = ANY(_tbls)
   LOOP
      EXECUTE 'SELECT NOT EXISTS (SELECT 1 FROM ' || _tbl || ')'
      INTO _empty;                          -- check first, only lock if empty

      IF _empty THEN
         EXECUTE 'LOCK TABLE ' || _tbl;     -- now table is ripe for the plucking

         EXECUTE 'SELECT NOT EXISTS (SELECT 1 FROM ' || _tbl || ')'
         INTO _empty;                       -- recheck after lock

         IF _empty THEN
            EXECUTE 'DROP TABLE ' || _tbl;  -- go in for the kill
            drop_ct := drop_ct + 1;         -- count tables actually dropped
         END IF;
      END IF;
   END LOOP;
END
$func$ LANGUAGE plpgsql STRICT;

Call:

SELECT f_drop_tables('{foo1,foo2,foo3,foo4}');

To call with a different schema than the default 'public':

SELECT f_drop_tables('{foo1,foo2,foo3,foo4}', 'my_schema');

Major points

Reports the number of tables actually dropped. (Adapt to report info of your choice.)
Using the information schema like in your original. Seems the right choice here, but be aware of subtle limitations:
- How to check if a table exists in a given schema
For use under heavy concurrent load (with long transactions), consider the NOWAIT option for the LOCK command and possibly catch exceptions from it.
Per documentation on "Table-level Locks":

ACCESS EXCLUSIVE

Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder is the only transaction accessing the table in any way.

Acquired by the ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM FULL commands. This is also the default lock mode for LOCK TABLE statements that do not specify a mode explicitly.

Bold emphasis mine.

Upvotes: 3

Mike Jones

Reputation: 532

I work on DB2 so I can't say for sure if this will work on a Postgres database, but try this:

select case when count(*) > 0 then True else False end from $1

...in place of:

SELECT not exists(SELECT 1 FROM  $1 )

If Postgres does not have a CASE / END expression capability, I'd be shocked if it didn't have some kind of similar IF/THEN/ELSE like expression ability to use as a substitute.

Upvotes: -2

krokodilko

Reputation: 36117

You must use EXECUTE if you want to pass a table name as parameter in a Postgres function.
Example:

CREATE OR REPLACE FUNCTION __table_is_empty(param character varying) 
RETURNS bool
AS $$
DECLARE
 v int;
BEGIN
      EXECUTE 'select 1 WHERE EXISTS( SELECT 1 FROM ' || quote_ident(param) || ' ) '
            INTO v;
      IF v THEN return false; ELSE return true; END IF;
END;
$$ LANGUAGE plpgsql;
/

Demo: http://sqlfiddle.com/#!12/09cb0/1

Upvotes: 3

Check if table is empty in runtime

Answers (3)

Proper function

Major points

Related Questions