Incremental archive and restoration of partial Postgresql tables

Question

I am wondering what are the best practices to:

Periodically (monthly) archive a portion of a Postgresql table to a .sql file
Then have the ability to restore that portion to the original table if necessary

For example, say I have transactions_table that looks like this:

+----+------------+-------------+
| Id |    Date    | Transaction |
+----+------------+-------------+
|  1 | 2017-06-01 |         123 |
|  2 | 2017-06-02 |         120 |
|  3 | 2017-07-01 |          45 |
|  4 | 2017-08-01 |         420 |
+----+------------+-------------+

How would I go about archiving data for June of 2017 such that all data from June 2017 is removed from transactions_table and I produce a June2017.sql file that contains insert commands and all other information so it can easily be reintroduced to transactions_table? Is this possible with pg_dump or is there a better alternative?

The Postgresql db is v9.6 and hosted by AWS. I plan to store the June2017.sql and such either within an S3 bucket or long term storage such as AWS Glacier. I can figure that part out. Just need to know how to to archive the data appropriately.

Sorry for broad question. Any and all help is appreciated.

sticky bit · Accepted Answer

pg_dump doesn't support a WHERE out of the box. But here somebody provided a patch, that introduces this functionality. You can try if that works for you.

An alternative would be to write a function, that creates a new table, say archive_transactions_table and pg_dump that table.

CREATE FUNCTION create_archive_transactions_table
                (_year integer,
                 _month integer)
                RETURNS void
AS
$$
BEGIN
  DROP TABLE IF EXISTS archive_transactions_table;
  CREATE TABLE archive_transactions_table
               AS SELECT *
                         FROM transactions_table
                         WHERE date_part('year', date) = _year
                               AND date_part('month', date) = _month;
END;
$$
LANGUAGE plpgsql;

You could also roll your own INSERT statements by text concatenation. You could then select these from the transactions_table for the rows withing the time period. Possibly put it in a function using COPY to directly export the made up statements to a file. COPYdoes support WHERE (or a query in general) but doesn't create INSERT statements by it's own.

CREATE FUNCTION archive_transactions_table
                (_year integer,
                 _month integer,
                 _path text)
                RETURNS void
AS
$$
DECLARE
  _copy_command text;
BEGIN
  _copy_command := 'COPY (SELECT ''INSERT INTO transactions_table (id, date, transaction) VALUES ('' || id || '', '''''' || date || '''''', '' || transaction || '');'' insert_query'
                || '             FROM transactions_table'
                || '             WHERE date_part(''year'', date) = ' || _year
                || '                   AND date_part(''month'', date) = ' || _month || ')'
                || 'TO ''' || regexp_replace(_path, '([^/])$', '\1/') || _year || '-' || lpad(_month::text, 2, '0') || '.sql''';
  EXECUTE _copy_command;
END;
$$
LANGUAGE plpgsql;

(Note: The path provided to COPY cannot be an expression (like _year || '-' || _month || '.sql). That's why the whole command has to be build up dynamically.)

Or any possible mix of the above.

Have a look at pgAgent for creating a scheduled task.

Incremental archive and restoration of partial Postgresql tables

Answers (1)

Related Questions