Reputation: 17541
I'm trying to bulk load around 200M lines (3.5GB) of data to an Amazon RDS postgresql DB using the following command:
cat data.csv | psql -h<host>.rds.amazonaws.com -U<user> <db> -c "COPY table FROM STDIN DELIMITER AS ','"
After a couple of minutes I get this error:
connection not open
connection to server was lost
If I run head -n 100000000 data.csv
to send the first 100M lines instead of all 200M then the command succeeds instead. I'm guessing that there's a timeout somewhere that's causing the query with the full dataset to fail. I've not been able to find any timeout settings or parameters though.
How can I make the bulk insert succeed with the full dataset?
Upvotes: 3
Views: 2546
Reputation: 78561
As I read the statement you're using, it basically creates a giant string, then connects to SQL and then it tries to feed the entire string as argument.
If you load psql and run something like \copy ... from '/path/to/data.csv'
..., I'd imagine the connection might stay alive while the file's content is streamed chunk by chunk.
That would be my hunch as to why 10M lines works (= argument pushed entirely before the connection times out) but not the entire file (= argument still uploading).
Upvotes: 3