ConanTheGerbil
ConanTheGerbil

Reputation: 787

Postgres - Bulk transferring of data from one table to another

I need to transfer a large amount of data (several million rows) from one table to another. So far I’ve tried doing this….

INSERT INTO TABLE_A (field1, field2) 
SELECT field1, field2 FROM TABLE_A_20180807_BCK;

This worked (eventually) for a table with about 10 million rows in it (took 24 hours). The problem is that I have several other tables that need the same process applied and they’re all a lot larger (the biggest is 20 million rows). I have attempted a similar load with a table holding 12 million rows and it failed to complete in 48 hours so I had to cancel it.

Other issues that probably affect performance are 1) TABLE_A has a field based on an auto-generated sequence, 2) TABLE_A has an AFTER INSERT trigger on it that parses each new record and adds a second record to TABLE_B

A number of other threads have suggested doing a pg_dump of TABLE_A_20180807_BCK and then load the data back into TABLE_A. I’m not sure a pg_dump would actually work for me because I’m only interested in couple of fields from TABLE_A, not the whole lot.

Instead I was wondering about the following….

Export to a CSV file…..

COPY TABLE_A_20180807_BCK (field1,field2) to 'd:\tmp\dump\table_a.dump' DELIMITER ',' CSV;

Import back into the desired table….

COPY TABLE_A(field1,field2) FROM 'd:\tmp\dump\table_a.dump' DELIMITER ',' CSV

Is the export/import method likely to be any quicker – I’d like some guidance on this before I start on another job that may take days to run, and may not even work any better! The obvious answer of "just try it and see" isn't really an option, I can't afford more downtime!

(this is follow-on question from this, if any background details are required)

Update.... I don't think there is any significant problems with the trigger. Under normal circumstances records are INPUTed into TABLE_A at a rate of about 1000/sec (including trigger time). I think the issue is likely to be size of the transaction, under normal circumstances records are inserted into in blocks of 100 records per INSERT, the statement shown above attempts to add 10 million records in a single transaction, my guess is that this is the problem, but I've no way of knowing if it really is, or if there's a suitable work around (or if the export/import method I've proposed would be any quicker)

Maybe I should have emphasized this earlier, every insert into TABLE_A fires a trigger that adds record to TABLE_B. It's the data that's in TABLE_B that's the final objective, so disabling the trigger isn't an option! This whole problem came about because I accidentally disabled the trigger for a few days, and the preferred solution to the question 'how to run a trigger on existing rows' seemed to be 'remove the rows and add them back again' - see the original post (link above) for details.

My current attempt involves using the COPY command with a WHERE clause to split the contents of TABLE_A_20180807_BCK into a dozen small files and then re-load them one at a time. This may not give me any overall time saving, but although I can't afford 24 hours of continuous downtime, I can afford 6 hours of downtime for 4 nights.

Upvotes: 3

Views: 6774

Answers (1)

Joe Love
Joe Love

Reputation: 5972

Preparation (if you have access and can restart your server) set checkpoint_segments to 32 or perhaps more. This will reduce the frequency and number of checkpoints during this operation. You can undo it when you're finished. This step is not totally necessary but should speed up writes considerably.

edit postgresql.conf and set checkpoint_segments to 32 or maybe more

Step 1: drop/delete all indexes and triggers on table A.

EDIT: Step 1a

alter table_a set unlogged;

(repeat step 1 for each table you're inserting into)

Step 2. (unnecessary if you do one table at a time)

 begin transaction;

Step 3.

   INSERT INTO TABLE_A (field1, field2) 
   SELECT field1, field2 FROM TABLE_A_20180807_BCK;

(repeat step 3 for all tables being inserted into)

Step 4. (unnecessary if you do one table at a time)

 commit;

Step 5 re-enable indexes and triggers on all tables.

Step 5a.

 Alter table_a set logged;

Upvotes: 2

Related Questions