Reputation: 21
I have a huge database of over 20 million rows. I can export the whole database (which takes hours), but when I try to filter the data using python (pandas) pycharm fails (due to memory issues).
Is there a way to export the database in batches of 2 million rows for an example? Export 2mil, then other 2mil and have 10 files of 2 million rows at the end? This way I can filter every file using python (pandas) and I won't have memory issues.
Thanks!
Upvotes: 0
Views: 3817
Reputation: 44167
20 million rows isn't that many. Unless each individual row is quite large or you are sending over a slow network, exporting should take minutes, not hours.
If it is a single table you want to split, that is a better task for COPY, not pg_dump.
psql -c 'copy pgbench_accounts to stdout'|split -l 2000000
You would type it at a terminal command prompt, just like you would pg_dump. It will send out the table's data just like pg_dump does (except it does just the data, and for just one table), but then the linux command split
breaks it up into files of 2e6 lines each.
Well, I see you didn't specifically mention pg_dump in your post. When you said "export", what did you mean by that? I had assumed you meant pg_dump.
Upvotes: 2
Reputation: 470
You can use pg_dump to only extract one or more tables or exclude tables if that is going to help
Upvotes: 0