Reputation: 21426
I had so much trouble doing this - I thought I would make a Q/A on StackOverflow to explain the process.
The question is about copying an RDS postgres database for development usage - especially for testing database migrations scripts, etc. That's why the focus on a "single schema" within a "single database".
In my case, I want to create a test database that's as isolated as possible, while remaining within a single RDS instance (because spinning up entire RDS instances takes anywhere from 5 - 15 minutes and because I'm cheap).
Upvotes: 10
Views: 11493
Reputation: 336
Since the requirement is to use a single RDS instance, just use
CREATE DATABASE "duplicate" WITH TEMPLATE "original" OWNER postgres;
where "original" is your source database and "duplicate" is the destination database you want to create.
Dumping the database is time costly as you have to download all the data of the database locally, while then you will have to upload them as well.
Reference : https://www.postgresql.org/docs/current/manage-ag-templatedbs.html
Upvotes: 0
Reputation: 21426
Here is an answer using only the command line.
Pre-requisites:
Example context:
rds.example.com
which has a master user named rds_master
.db_dev_user
, a database named dev_db
that contains the schema app_schema
.Note: this guide was written in 2017, for postgres version 9.6. If you find that some steps are no longer working on a recent version of postgres - please do post any fixes to this post as comments or alternative answers.
pg_dump prints out the schema and data of the original database and will work even while there are active connections to the database. Of course, performance for those connections is likely to be affected, but the resultant copy of the DB is transactional.
pg_dump --host=rds.example.com --port=5432 \
--format=custom \
--username=db_dev_user --dbname=dev_db \
> pgdumped
The createuser command creates the user that your test application/processes should connect with (for better isolation), note that the created user is not a superuser and it cannot create databases or roles.
createuser --host=rds.example.com --port=5432 \
--username=rds_master \
--no-createdb --no-createrole --no-superuser \
--login --pwprompt \
db_test_user
Without this next grant command the following createdb
will fail:
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=postgres \
--command="grant db_test_user TO rds_master"
createdb does what it says on the tin; note that the db_test_user
role "owns" the DB.
createdb --host=rds.example.com --port=5432 \
--username=rds_master --owner=db_test_user test_db
The create schema command is next. The db_test_user
cannot create the schema, but it must be authorized for the schema or the pg_restore
would fail because it would end up trying to restore into the pg_catalog
schema (so note that user=rds_master
, but dbname=test_db
).
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=test_db \
--command="create schema app_schema authorization db_test_user"
Finally, we issue the pg_restore command, to actually create the schema objects (tables, etc.) and load the data into them:
pg_restore --host=rds.example.com --port=5432 \
--verbose --exit-on-error --single-transaction \
--username=db_test_user --schema=app_schema \
--dbname=test_db --no-owner \
./pgdumped
exit-on-error
- because otherwise finding out what went wrong involves too much scrolling and scanning (also it's implied by single-transaction
anyway)single-transaction
- avoids having to drop or recreate the DB if things go pear-shapedschema
- only do the schema we care about (can also supply this to the original pg_dump
command)dbname
- to ensure use of the DB we createdno-owner
- we're connecting as db_test_user
anyway, so everything should be owned by the right userUpvotes: 14
Reputation: 21426
For production, you'd be better off just taking an RDS snapshot of your instance and restoring that, which will create an entirely new RDS instance.
On a mostly empty database - it takes a few minutes to create the snapshot and another 5 minutes or so to create the new RDS instance (that's part of why it's a pain during development).
You will be charged for the new RDS instance only while it is running. Staying within the free tier is one of the reasons I wanted to create this DB with the same instance for development purposes, plus not having to deal with a second DNS name; and that effect is multiplied as you start to have multiple small development environments.
Running a second RDS instance is the better option for production because you nearly completely eliminate any risk to your original DB. Also, when you're dealing with real amounts of data - snapshot/DB creation times will be dwarfed by the amount of time spent reading/writing the data. For large amounts of data, it's likely the Amazon RDS snapshot creation/restore process is going to have far better parallelisation than a set of scripts running on a single server somewhere. Additionally, the RDS console gives you visilibility into the progress of the restore - which becomes invaluable as the dataset grows larger and more people become involved.
Upvotes: 6