Reputation: 3272
I have often the needs to copy my production database into my dev environment. For that I have a rake task that pull the database from my production server creating a .dump file then I recreate from scratch my development database.
Here is my rake task :
namespace :db do
desc 'Pull production db to development'
task :pull => [:dump, :restore]
task :dump do
dumpfile = "#{Rails.root}/tmp/latest.dump"
puts 'PG_DUMP on production database...'
production = Rails.application.config.database_configuration['production']
system "ssh root@my_prod_server 'postgres:export prod-db' > #{dumpfile}"
puts 'Done!'
end
task :restore do
dev = Rails.application.config.database_configuration['development']
dumpfile = "#{Rails.root}/tmp/latest.dump"
puts 'PG_RESTORE on development database...'
system "pg_restore --verbose --clean --no-acl --no-owner -h 127.0.0.1 -U #{dev['username']} -d #{dev['database']} #{dumpfile}"
puts 'Done!'
end
end
It's working great.
The problem is, it's start to becoming bigger and bigger and for each row I have large amount of jsonb
data. At this moment my latest.dump file is around 700MB and take ~5min to perform.
Is there any other solution than this to speed up this process. Would this be possible to just copy/paste the whole DB at once instead of recreating every row from scratch every time ?
Thanks !
Upvotes: 1
Views: 273
Reputation: 165576
As you're discovering, copying your production database to development does not scale. In addition to the sheer size, there is are security issues and PII concerns with developers being able to access, let alone have root access to and copy, the production database.
Regardless of what you're doing with the data, copying the production database should not be a regular part of the development process. If you're doing this regularly, question your process. It should be slow and difficult so developers avoid doing it.
Depending on what you're doing, better alternatives are...
Set up a database which continually follows production. Connect your development environment to that. There are still PII concerns with letting developers connect to a follower, but the database is read-only so the developers cannot accidentally damage the data.
This is useful for realistic performance testing, or to investigate a nasty bug which requires investigation of production data.
And, as jjanes answered it can be detached and promoted to being a read-write database.
A follower is also useful for production. Expensive read-only queries which don't require up-to-the-second accuracy such as reports, statistics, and summaries can be done on the follower.
If you just need realistic test data, use tools such as FactoryBot and Faker to easily create complex, interconnected data.
You can also reduce the size of your production database.
Often only a fraction of the data in the production database is needed for production. The rest is old data useful for auditing, statistics, and analysis.
You can reduce the size of your production database by moving historical data into a data warehouse such as Google BigQuery. Then do your audits, statistics, and analysis on the data warehouse; it's much better set up for that.
jsonb is appropriate for unstructured data where you're not sure what will be stored. It is not a substitute for a proper schema. Consider what you're putting into jsonb and consider whether it would be better served by a real column. This can reduce the size of your database.
Upvotes: 4
Reputation: 44383
You could set up a hot standby so that it is following along just behind production. Then when you need it, just promote it to read-write and it has become your new dev. It should only take a fraction of second to promote (assuming it existed and was not too far behind to start with). Then immediately start standing up the next hot standby so that it is ready to go when you want it.
Upvotes: 2