Reputation: 1222
When I began, I used pg_dump
with the default plain format. I was unenlightened.
Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz
. I was enlightened.
When it came time to create the database anew,
# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;
% gunzip dumpfile.gz # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile # into a new, empty database defined above
I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:
# select pg_size_pretty(pg_database_size('dbname'));
47 GB
Because there are predictions this database will be a few terabytes, I need to look at improving performance now.
Please, enlighten me.
Upvotes: 106
Views: 90469
Reputation: 23900
I assume you need backup, not a major upgrade of database.
For backup of large databases you should setup continuous archiving instead of pg_dump
.
Make your base backups for example every day by using
psql template1 -c "select pg_start_backup('`\`date +%F-%T\``')"
rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/
psql template1 -c "select pg_stop_backup()"
A restore would be as simple as restoring database and WAL logs not older than pg_start_backup
time from backup location and starting Postgres. And it will be much faster.
Upvotes: 11
Reputation: 9469
If you're facing issues with the speed of pg_restore
check whether you dumped your data using INSERT
or COPY
statement.
If you use INSERT
(pg_dump
is called with --column-inserts
parameter) the restore of data would be significantly slower.
Using INSERT
is good for making dumps that are loaded into non-Postgres databases. But if you do a restore to Postgres omit using --column-inserts
parameter when using pg_dump
.
Upvotes: 3
Reputation: 1261
PG_DUMP | always use format-directory and -j
options
time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external
PG_RESTORE | always use tuning for postgres.conf and format-directory and -j
options
work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/
Upvotes: 64
Reputation: 8999
zcat dumpfile.gz | pg_restore -d db_name
Removes the full write of the uncompressed data to disk, which is currently your bottleneck.
Upvotes: 8
Reputation: 54882
First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffers
should be set correctly, maintenance_work_mem
should be increased during the restore, full_page_writes
should be off during the restore, wal_buffers
should be increased to 16MB during the restore, checkpoint_segments
should be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuum
should be disabled during the restore.
If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.
Upvotes: 69
Reputation: 16417
Two issues/ideas:
By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.
If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.
Upvotes: 14
Reputation: 118701
As you may have guessed simply by the fact that compressing the backup results in faster performance, your backup is I/O bound. This should come as no surprise as backup is pretty much always going to be I/O bound. Compressing the data trades I/O load for CPU load, and since most CPUs are idle during monster data transfers, compression comes out as a net win.
So, to speed up backup/restore times, you need faster I/O. Beyond reorganizing the database to not be one huge single instance, that's pretty much all you can do.
Upvotes: 3