andrewH
andrewH

Reputation: 2321

Easiest way to replicate (copy? Export and import?) a large, rarely changing postgreSQL database

I have imported about 200 GB of census data into a postgreSQL 9.3 database on a Windows 7 box. The import process involves many files and has been complex and time-consuming. I'm just using the database as a convenient container. The existing data will rarely if ever change, and will be updating it with external data at most once a quarter (though I'll be adding and modifying intermediate result columns on a much more frequent basis. I'll call the data in the database on my desktop the “master.” All queries will come from the same machine, not remote terminals.

I would like to put copies of all that data on three other machines: two laptops, one windows 7 and one Windows 8, and on a Ubuntu virtual machine on my Windows 7 desktop as well. I have installed copies of postgreSQL 9.3 on each of these machines, currently empty of data. I need to be able to do both reads and writes on the copies. It is OK, and indeed I would prefer it, if changes in the daughter databases do not propagate backwards to the primary database on my desktop. I'd want to update the daughters from the master 1 to 4 times a year. If this wiped out intermediate results on the daughter databases this would not bother me.

Most of the replication techniques I have read about seem to be worried about transaction-by-transaction replication of a live and constantly changing server, and perfect history of queries & changes. That is overkill for me. Is there a way to replicate by just copying certain files from one postgreSQL instance to another? (If replication is the name of a specific form of copying, I'm trying to ask the more generic question). Or maybe by restoring each (empty) instance from a backup file of the master? Or of asking postgreSQL to create and export (ideally on an external hard drive) some kind of postgreSQL binary of the data that another instance of postgreSQL can import, without my having to define all the tables and data types and so forth again?

This question is also motivated by my desire to work around a home wifi/lan setup that is very slow – a tenth or less of the speed of file copies to an external hard drive. So if there is a straightforward way to get the imported data from one machine to another by transference of (ideally compressed) binary files, this would work best for my situation.

Upvotes: 0

Views: 364

Answers (1)

khampson
khampson

Reputation: 15306

While you could perhaps copy the data directory directly as mentioned by Nick Barnes in the comments above, I would recommend using a combination of pg_dump and pg_restore, which will dump a self-contained file which can then be dispersed to the other copies.

You can run pg_dump on the master to get a dump of the DB. I would recommend using the options -Fc -j3 to use the custom binary format (instead of dumping in SQL format; this should be much smaller and perhaps faster as well) and will dump 3 tables at once (this can be adjusted up or down depending on the disk throughput capabilities of your machine and the number of cores that it has).

Then you run dropdb on the copies, createdb to recreate an empty DB of the same name, and then run pg_restore on that new empty DB to restore the dump file to the DB. You would want to use the options -d <dbname> -f <dump_file> -j3 (again adjusting the number for -j according to the abilities of the machine).

When you want to refresh the copies with new content from the master DB, simply repeat the above steps

Upvotes: 1

Related Questions