Sector7B
Sector7B

Reputation: 133

Plulling a sample of your database - Postgresql

we have a semi-large database that takes a long time to import locally on our individual dev machines. I was wondering if there were any tools that could connect to a database, do a little analysis and pull a sample of the database while keeping all the relations via a passed in parameter. Something like pg_sample --rough_size 1000 --dbhost mydbhost --dbuname uname --dbpword pword this creates a dump that i can import into my database for testing and development. But won't take 45 minutes to do it.

Thanks

Upvotes: 8

Views: 883

Answers (2)

mla
mla

Reputation: 304

I wrote such a tool: https://github.com/mla/pg_sample

From the README:

pg_sample is a utility for exporting a small, sample dataset from a larger PostgreSQL database. The output and command-line options closely resemble the pg_dump backup utility (although only the plain-text format is supported).

The sample database produced includes all tables from the original, maintains referential integrity, and supports circular dependencies.

Upvotes: 8

Andrew
Andrew

Reputation: 1157

I suggest you investigate pg_dump --format=custom and pg_restore -j 12. The custom format allows parallel restores (and can be dumped to raw SQL if desired, so you aren't loosing any functionality here). The -j parameter to pg_restore causes it to run the restore in parallel. This can provide a substantial speed up. Especially if you have cranked your maintenance_work_mem up nice and large.

It won't solve the problem you asked about directly for the reasons outlined by Erwin above, and because it requires a level of knowledge about what is important to represent and what isn't that simply isn't in the schema. It's something that takes up developer time and should generally be done at the same time the schema is being built in the first place.

Upvotes: 2

Related Questions