aggis
aggis

Reputation: 698

Incremental archive and restoration of partial Postgresql tables

I am wondering what are the best practices to:

  1. Periodically (monthly) archive a portion of a Postgresql table to a .sql file

  2. Then have the ability to restore that portion to the original table if necessary

For example, say I have transactions_table that looks like this:

+----+------------+-------------+
| Id |    Date    | Transaction |
+----+------------+-------------+
|  1 | 2017-06-01 |         123 |
|  2 | 2017-06-02 |         120 |
|  3 | 2017-07-01 |          45 |
|  4 | 2017-08-01 |         420 |
+----+------------+-------------+

How would I go about archiving data for June of 2017 such that all data from June 2017 is removed from transactions_table and I produce a June2017.sql file that contains insert commands and all other information so it can easily be reintroduced to transactions_table? Is this possible with pg_dump or is there a better alternative?

The Postgresql db is v9.6 and hosted by AWS. I plan to store the June2017.sql and such either within an S3 bucket or long term storage such as AWS Glacier. I can figure that part out. Just need to know how to to archive the data appropriately.

Sorry for broad question. Any and all help is appreciated.

Upvotes: 0

Views: 536

Answers (1)

sticky bit
sticky bit

Reputation: 37482

  1. pg_dump doesn't support a WHERE out of the box. But here somebody provided a patch, that introduces this functionality. You can try if that works for you.

  2. An alternative would be to write a function, that creates a new table, say archive_transactions_table and pg_dump that table.

    CREATE FUNCTION create_archive_transactions_table
                    (_year integer,
                     _month integer)
                    RETURNS void
    AS
    $$
    BEGIN
      DROP TABLE IF EXISTS archive_transactions_table;
      CREATE TABLE archive_transactions_table
                   AS SELECT *
                             FROM transactions_table
                             WHERE date_part('year', date) = _year
                                   AND date_part('month', date) = _month;
    END;
    $$
    LANGUAGE plpgsql;
    
  3. You could also roll your own INSERT statements by text concatenation. You could then select these from the transactions_table for the rows withing the time period. Possibly put it in a function using COPY to directly export the made up statements to a file. COPYdoes support WHERE (or a query in general) but doesn't create INSERT statements by it's own.

    CREATE FUNCTION archive_transactions_table
                    (_year integer,
                     _month integer,
                     _path text)
                    RETURNS void
    AS
    $$
    DECLARE
      _copy_command text;
    BEGIN
      _copy_command := 'COPY (SELECT ''INSERT INTO transactions_table (id, date, transaction) VALUES ('' || id || '', '''''' || date || '''''', '' || transaction || '');'' insert_query'
                    || '             FROM transactions_table'
                    || '             WHERE date_part(''year'', date) = ' || _year
                    || '                   AND date_part(''month'', date) = ' || _month || ')'
                    || 'TO ''' || regexp_replace(_path, '([^/])$', '\1/') || _year || '-' || lpad(_month::text, 2, '0') || '.sql''';
      EXECUTE _copy_command;
    END;
    $$
    LANGUAGE plpgsql;
    

    (Note: The path provided to COPY cannot be an expression (like _year || '-' || _month || '.sql). That's why the whole command has to be build up dynamically.)

Or any possible mix of the above.

Have a look at pgAgent for creating a scheduled task.

Upvotes: 1

Related Questions