Steven
Steven

Reputation: 776

Best way to backup and restore data in PostgreSQL for testing

I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.

Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.

On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.

Upvotes: 2

Views: 2090

Answers (2)

Steven
Steven

Reputation: 776

For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.

There are 3 types of changes.

  1. For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
  2. For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.

Upvotes: 1

Robins Tharakan
Robins Tharakan

Reputation: 2483

If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.

So for e.g.

(from bash) createdb todayDB -T snapshotDB

or

(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;

Pros:

  • In theory, always exact same DB by design (no custom logic)
  • Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)

Cons:

  • Takes 2x the disk space (although template could be on a low performance NFS etc)

Upvotes: 5

Related Questions