Ian Storm Taylor
Ian Storm Taylor

Reputation: 8710

How to schedule a continual copy of a database from production to staging on Heroku?

I'm looking to setup a schedule such that my staging database is always a copy of my production database, and have the copy happen once a day or once an hour.

I know that there's an easy way to perform a one-off copy, using heroku pg:

heroku pg:copy SOURCE_APP::SOURCE_DATABASE DESTINATION_DATABASE --app DESTINATION_DATABASE

But that's just a one-time thing.

I also know that it's easy to run scheduled commands with Heroku scheduler, which are run in one-off dynos. But the dynos don't have the Heroku Toolbelt installed on them, so heroku pg:copy isn't available.

I've tried using pg_dump | pg_restore, but I run into one of two issues:

Without restricting the schema, it fails because I don't have the right permissions I think:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3205; 0 0 COMMENT EXTENSION plpgsql 
pg_restore: [archiver (db)] could not execute query: ERROR:  must be owner of extension plpgsql
    Command was: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';

With restricting the pg_dump to only --schema=public, it fails because some extensions depend on the public schema, and they aren't being dropped.

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 9; 2615 16385 SCHEMA public Storm
pg_restore: [archiver (db)] could not execute query: ERROR:  cannot drop schema public because other objects depend on it
DETAIL:  extension citext depends on schema public
extension pg_stat_statements depends on schema public
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
    Command was: DROP SCHEMA IF EXISTS public;

What's the best way do this?

Upvotes: 10

Views: 894

Answers (2)

David Aldridge
David Aldridge

Reputation: 52376

There's no reason why you can't download and install the CLI on the dyno as a precursor to every copy of the production database.

All you need is a shell script to call from the scheduler, and let the shell script install the CLI and use environment variables as appropriate to hold the required credentials.

Simple two-step process ...

  1. Install CLI
  2. Use CLI to perform copy of database

In fact it can be a standalone app without any web or worker processes that runs this.

We use a similar method to run our own backups and to upload them to an AWS S3 bucket, independently of Heroku's own backup, so we install both the Heroku and the AWS CLIs.

Upvotes: 4

Julie
Julie

Reputation: 2011

Have you considered creating a database follower (aka master/slave configuration)? https://devcenter.heroku.com/articles/heroku-postgres-follower-databases

This would continually update your follower database with data from production. Note that if you changed data directly on the follower database, that would not necessarily be overwritten, so not sure if that is a possible solution for what you need.

Upvotes: 6

Related Questions