Boris Sokolov
Boris Sokolov

Reputation: 21

How to export huge postgresql database in batches?

I have a huge database of over 20 million rows. I can export the whole database (which takes hours), but when I try to filter the data using python (pandas) pycharm fails (due to memory issues).

Is there a way to export the database in batches of 2 million rows for an example? Export 2mil, then other 2mil and have 10 files of 2 million rows at the end? This way I can filter every file using python (pandas) and I won't have memory issues.

Thanks!

Upvotes: 0

Views: 3817

Answers (2)

jjanes
jjanes

Reputation: 44167

20 million rows isn't that many. Unless each individual row is quite large or you are sending over a slow network, exporting should take minutes, not hours.

If it is a single table you want to split, that is a better task for COPY, not pg_dump.

psql -c 'copy pgbench_accounts to stdout'|split -l 2000000

You would type it at a terminal command prompt, just like you would pg_dump. It will send out the table's data just like pg_dump does (except it does just the data, and for just one table), but then the linux command split breaks it up into files of 2e6 lines each.

Well, I see you didn't specifically mention pg_dump in your post. When you said "export", what did you mean by that? I had assumed you meant pg_dump.

Upvotes: 2

Slumdog
Slumdog

Reputation: 470

You can use pg_dump to only extract one or more tables or exclude tables if that is going to help

Upvotes: 0

Related Questions