Dennis Thrysøe
Dennis Thrysøe

Reputation: 1911

Best method for PostgreSQL incremental backup

I am currently using pg_dump piped to gzip piped to split. But the problem with this is that all output files are always changed. So checksum-based backup always copies all data.

Are there any other good ways to perform an incremental backup of a PostgreSQL database, where a full database can be restored from the backup data?

For instance, if pg_dump could make everything absolutely ordered, so all changes are applied only at the end of the dump, or similar.

Upvotes: 83

Views: 95541

Answers (4)

Aayush Shah
Aayush Shah

Reputation: 703

If you're looking for a more efficient way to perform incremental backups in PostgreSQL, PostgreSQL 17 introduces a new feature that makes incremental backups much easier and more efficient.

PostgreSQL 17 has built-in support for incremental backups, which allows you to back up only the changes made since the last backup. This can significantly reduce the time and storage required for backups, especially for large databases.

Official documentation

I have made a sample dockerfile to explain and execute the concept of incremental backup :

# Use the official PostgreSQL 17 image as a base
FROM postgres:17

# Set environment variables for PostgreSQL
ENV POSTGRES_USER=postgres
ENV POSTGRES_PASSWORD=mysecretpassword
ENV POSTGRES_DB=testdb

# Install necessary tools as root
USER root
RUN apt-get update && apt-get install -y \
    vim \
    nano \
    procps \
    && rm -rf /var/lib/apt/lists/*

# Switch to postgres user for initialization
USER postgres

# Reinitialize the database cluster
RUN rm -rf /var/lib/postgresql/data/* \
    && /usr/lib/postgresql/17/bin/initdb -D /var/lib/postgresql/data

# RUN /usr/lib/postgresql/17/bin/initdb -D /var/lib/postgresql/data

# Create directories for backups and archive with correct permissions
RUN mkdir -p /var/lib/postgresql/data/pg_archive \
    /var/lib/postgresql/data/pg_base_backup \
    /var/lib/postgresql/data/pg_incr_backup \
    && chown -R postgres:postgres /var/lib/postgresql/data/pg_archive \
    /var/lib/postgresql/data/pg_base_backup \
    /var/lib/postgresql/data/pg_incr_backup

# Copy the custom PostgreSQL configuration
COPY postgresql.conf /etc/postgresql/postgresql.conf

USER root 

# Ensure custom configuration is included in the database configuration
RUN echo "include '/etc/postgresql/postgresql.conf'" >> /usr/share/postgresql/postgresql.conf.sample

# Expose PostgreSQL port
EXPOSE 5432

# Start the PostgreSQL server
CMD ["postgres"]

More Resources

Upvotes: 1

Stephen Frost
Stephen Frost

Reputation: 720

Also check out http://www.pgbackrest.org

pgBackrest is another backup tool for PostgreSQL which you should be evaluating as it supports:

  • parallel backup (tested to scale almost linearly up to 32 cores but can probably go much farther..)
  • compressed-at-rest backups
  • incremental and differential (compressed!) backups
  • streaming compression (data is compressed only once at the source and then transferred across the network and stored)
  • parallel, delta restore (ability to update an older copy to the latest)
  • Fully supports tablespaces
  • Backup rotation and archive expiration
  • Ability to resume backups which failed for some reason
  • etc, etc..

Upvotes: 19

intgr
intgr

Reputation: 20466

Update: Check out Barman for an easier way to set up WAL archiving for backup.

You can use PostgreSQL's continuous WAL archiving method. First you need to set wal_level=archive, then do a full filesystem-level backup (between issuing pg_start_backup() and pg_stop_backup() commands) and then just copy over newer WAL files by configuring the archive_command option.

Advantages:

  • Incremental, the WAL archives include everything necessary to restore the current state of the database
  • Almost no overhead, copying WAL files is cheap
  • You can restore the database at any point in time (this feature is called PITR, or point-in-time recovery)

Disadvantages:

  • More complicated to set up than pg_dump
  • The full backup will be much larger than a pg_dump because all internal table structures and indexes are included
  • Does not work well for write-heavy databases, since recovery will take a long time.

There are some tools such as pitrtools and omnipitr that can simplify setting up and restoring these configurations. But I haven't used them myself.

Upvotes: 81

Scott Marlowe
Scott Marlowe

Reputation: 8870

Another method is to backup to plain text and use rdiff to create incremental diffs.

Upvotes: 0

Related Questions