Function to create indexes using schema variables

Question

I'm currently trying to create a function that will create an index on a table in every schema in the data warehouse. This is the script I have so far:

create or replace function dwh.loan_type_id_indexing()
returns void language plpgsql AS
$PROC$
Declare
       myschema varchar;
               sql text;        
Begin 
    for myschema in 
        SELECT nspname
          FROM pg_catalog.pg_namespace 
         where nspname not in ('information_schema', 'pg_catalog', 'pg_temp_1',
                               'pg_temp_7', 'pg_toast', 'pg_toast_temp_1',
                               'pg_toast_temp_7','public', 'c1', 'dwh',
                               'users', 'c2'
                              )
         order by nspname
    loop        
        sql = 'CREATE INDEX '|| myschema || '_' ||'type_id ON '|| 
        myschema || '.' ||'.fact_tbl USING btree (loan_type_id)';

        execute sql;

    end loop;
END
$PROC$
volatile;

I know this isn't correct but it will give you my thought process on what I'm trying to do.

Craig Ringer · Accepted Answer

Instead of filtering out schemas and assuming every other one has the the table you want, query information_schema for them and loop over the resulting list:

select t.table_schema
from information_schema.tables t inner join information_schema.columns c 
  on (t.table_schema = c.table_schema and t.table_name = c.table_name) 
where t.table_name = 'fact_loan' and c.column_name = 'loan_type_id'
  and t.table_schema NOT LIKE 'pg_%'
  and t.table_schema NOT IN ('information_schema', 'ad_delivery', 'dwh', 'users', 'wand');

You now have everything you need to create the index with EXECUTE by looping over the records returned from the query.

You might want to RAISE NOTICE 'Creating index on %s.fact_loan.loan_type_id', table_schema; to allow you to track progress, too, as index builds can take a while.

If you were going to filter schemas, you'd be better off using schemaname NOT LIKE 'pg_%' AND lower(shemaname) <> 'information_schema' as shown above.

BTW, I usually find this sort of job more convenient to do from a script outside the database where I have access to multiple connections, threading/multiprocessing, etc. A quick Python script with the psycopg2 driver for Pg would let you bang something like this together that does the index builds in parallel say 4 at a time; the right number would depend on your disk configuration.

Function to create indexes using schema variables

Answers (1)

Related Questions