SteamedUp
SteamedUp

Reputation: 131

PostgreSQL pg_dump/COPY

I have a requirement to dump the contents of a definable selection of tables as CSV's for an initial load of systems that are not able to connect with PostgreSQL for various reasons.

I have written a script to do this which runs through a list of tables using psql with the -c flag to run psql's \COPY command to dump the corresponding table to a file like this:

COPY table_name TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|');

It works fine. But I am sure you have already spotted the problem: as the process takes ~57 minutes for ~60 odd tables, the likelyhood of consistency is quite close to absolute zero.

I had a think about it and suspected I could make a few lightweight changes to pg_dump to do what I want, i.e., create multiple csv's from pg_dump whilst having a hope of integrity between the tables - and being able to specify parallel dumps too.

I have added a few flags to allow me to apply a file postfix (the date), set the format options and pass in a path for the relevant output file.

However my modified pg_dump was failing when writing to a file, like:

COPY table_name (pkey_id, field1, field2 ... fieldn) TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')

Note: Within pg_dump, the column list is expanded

So I cast around for further information and found these COPY Tips.

It looks like writing to a file is a no-no over the network; however I am on the same machine (for now). I felt writing to /tmp would be OK as it is writable by anyone.

So I tried cheating with:

seingramp@seluonkeydb01:~$ ./tp_dump -a -t table_name -D /tmp/ -k "FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|'" -K "_$DATE_POSTFIX"
tp_dump: warning: there are circular foreign-key constraints on this table:
tp_dump:   table_name
tp_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
tp_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
--
-- PostgreSQL database dump
--

-- Dumped from database version 12.3
-- Dumped by pg_dump version 14devel

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

--
-- Data for Name: material_master; Type: TABLE DATA; Schema: mm; Owner: postgres
--

COPY table_name (pkey_id, field1, field2 ... fieldn) FROM stdin;
tp_dump: error: query failed:
tp_dump: error: query was: COPY table_name (pkey_id, field1, field2 ... fieldn) TO PROGRAM 'gzip > /tmp/table_name_20200814.csv.gz' WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')

I have neutered the data as it is customer specific.

I didn't find pg_dump's error message very helpful, do you have any ideas as to what I am doing wrong? The changes really are quite small (excuse the code!) starting ~line 1900, ignoring the flags added around getopt().

        /*
         * Use COPY (SELECT ...) TO when dumping a foreign table's data, and when
         * a filter condition was specified.  For other cases a simple COPY
         * suffices.
         */
        if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
        {
                /* Note: this syntax is only supported in 8.2 and up */
                appendPQExpBufferStr(q, "COPY (SELECT ");
                /* klugery to get rid of parens in column list */
                if (strlen(column_list) > 2)
                {
                        appendPQExpBufferStr(q, column_list + 1);
                        q->data[q->len - 1] = ' ';
                }
                else
                        appendPQExpBufferStr(q, "* ");

                if ( copy_from_spec )
                {
                        if ( copy_from_postfix )
                        {
                                appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  tdinfo->filtercond ? tdinfo->filtercond : "",
                                                  copy_from_dest ? copy_from_dest : "",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  copy_from_postfix,
                                                  copy_from_spec);
                        }
                        else
                        {
                                appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  tdinfo->filtercond ? tdinfo->filtercond : "",
                                                  copy_from_dest ? copy_from_dest : "",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  copy_from_spec);
                        }
                }
                else
                {
                        appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  tdinfo->filtercond ? tdinfo->filtercond : "");
                }
        }
        else
        {
                if ( copy_from_spec )
                {
                        if ( copy_from_postfix )
                        {
                                appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  column_list,
                                                  copy_from_dest ? copy_from_dest : "",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  copy_from_postfix,
                                                  copy_from_spec);
                        }
                        else
                        {
                                appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  column_list,
                                                  copy_from_dest ? copy_from_dest : "",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  copy_from_spec);
                        }
                }
                else
                {
                        appendPQExpBuffer(q, "COPY %s %s TO stdout;",
                                                  fmtQualifiedDumpable(tbinfo),
                                                  column_list);
                }

I tried a couple of other cheats too, like specifying a directory owned by postgres. I know it's a quick hack but I hope you can help, and thanks for looking.

Upvotes: 4

Views: 8551

Answers (1)

Adrian Klaver
Adrian Klaver

Reputation: 19665

This is a use case for pg_restore -f. So:

-- Create custom format dump file
pg_dump -d some_db -U some_user -Fc -f dump.out
-- Move that file to where you need it
-- Dump data only from named table to a file from the dump file.
pg_restore -a -t table_1 -f table_1_data.sql dump.out

The pg_dump will create a consistent snapshot of the tables, so you have the database in a 'frozen' state in dump.out. Then you can use pg_restore to 'thaw out' those parts you need on your schedule. By using -a you will get the COPY you want.

Upvotes: 3

Related Questions