Jatin Bodarya
Jatin Bodarya

Reputation: 1455

Load contents from a CSV file into a PostgreSQL table

Below is a description of the procedure I went through to try and load data from a file into a PostgreSQL 8.0 database running on a Linux RedHat 7.2 host.

Now, my issue is that the FOR EVERY ROW trigger is getting called and the procedure is executing.

What I'd like it to do, however, is have it check the appropriate row of my table once I have given in the filename and decide based on the contents of the record whether to do a DUMP BULK DATA or a DUMP WHOLE CSV FILE only once (on the trigger).

Please help me solve this issue...

My logfile.tmp is as follows:

27/Apr/2013:17:03:42 +0530#192.168.1.3#16#0@#$http://localhost/images/
   banner-left.jpg@#$10.1ff.ff.ff#-#Y#-
27/Apr/2013:17:03:42 +0530#192.168.1.3#16#0@#$http://localhost/images/
   banner-left.jpg@#$10.ff.ff.2ff05#-#Y#-

The COPY command I am using:

/usr/local/pgsql/bin/psql localhost -d d1 -U u1 -tc "COPY tblaccesslog ( accesstime, clientip, username, request,bytes, urlpath, url, contenttype, issite, webcatname)  FROM 'logfile.tmp' WITH DELIMITER AS '#';" >> /tmp/parselog.log 2>&1

The trigger (insert_accesslog_trigger) in question:

insert_accesslog_trigger BEFORE INSERT ON tblaccesslog FOR EACH ROW EXECUTE PROCEDURE accesslog_insert_trigger()

and finally the trigger function (accesslog_insert_trigger()) being used:

accesslog_insert_trigger()
DECLARE
        tablemaxtuples NUMERIC(10);
        tableno NUMERIC(10);
        newtable TEXT;
        query TEXT;
        tablecount NUMERIC(10);
        min_limit NUMERIC(10);
        max_limit NUMERIC(10);
BEGIN

        tablemaxtuples := 100000;
    tableno := ( NEW.id - ( NEW.id % tablemaxtuples ) ) / tablemaxtuples +1;
    newtable := 'tblaccesslog'||to_char(CURRENT_DATE,'YYYYMMDD')||'_child_'||tableno;

        SELECT trim(count(tablename)) INTO tablecount FROM pg_tables WHERE tablename=newtable ;
    IF tablecount = 0
    THEN
                min_limit := (tableno-1)*tablemaxtuples;
                max_limit := min_limit + tablemaxtuples;
                                query := 'CREATE TABLE '||newtable||'( PRIMARY KEY (id),CHECK ( id >= '||min_limit||' AND id <'||max_limit||'  ) ) INHERITS (tblaccesslog)';
        EXECUTE query;
     END IF;

    query := 'INSERT INTO '|| newtable ||' ( id, username, clientip, url, accesstime, requestbytes, contenttype, issite, urlpath, webcatname ) VALUES ('||NEW.id||','''||NEW.username||''','''||NEW.clientip||''','''||NEW.url||''','''||NEW.accesstime||''','''||NEW.requestbytes||''','''||NEW.contenttype||''','''||NEW.issite||''','''|| replace(NEW.urlpath,'\'','') ||''','''||NEW.webcatname||''')';
    EXECUTE query;
    RETURN NULL;

END;

Upvotes: 1

Views: 1280

Answers (1)

IMSoP
IMSoP

Reputation: 97996

The PostgreSQL documentation overview of triggers makes clear that there is no type of trigger that suits your requirements: a FOR EACH ROW trigger will, as its name says, be executed once for each row, and as the manual page states "Statement-level triggers do not currently have any way to examine the individual row(s) modified by the statement."

However, what you can do instead is put your actual COPY command inside a function. The function could COPY TO a temporary table and then perform the appropriate steps to determine where it should go from there.

Then your copy command (which I'm guessing is in a cron job or similar) would just run SELECT bulk_insert_access_log(); rather than the long line currently listed.

Upvotes: 2

Related Questions