Postgres copy query connection lost error

Question

I'm trying to bulk load around 200M lines (3.5GB) of data to an Amazon RDS postgresql DB using the following command:

cat data.csv | psql -h.rds.amazonaws.com -U  -c "COPY table FROM STDIN  DELIMITER AS ','"

After a couple of minutes I get this error:

connection not open
connection to server was lost

If I run head -n 100000000 data.csv to send the first 100M lines instead of all 200M then the command succeeds instead. I'm guessing that there's a timeout somewhere that's causing the query with the full dataset to fail. I've not been able to find any timeout settings or parameters though.

How can I make the bulk insert succeed with the full dataset?

Denis de Bernardy · Accepted Answer

As I read the statement you're using, it basically creates a giant string, then connects to SQL and then it tries to feed the entire string as argument.

If you load psql and run something like \copy ... from '/path/to/data.csv' ..., I'd imagine the connection might stay alive while the file's content is streamed chunk by chunk.

That would be my hunch as to why 10M lines works (= argument pushed entirely before the connection times out) but not the entire file (= argument still uploading).

Postgres copy query connection lost error

Answers (1)

Related Questions