Ben Dowling
Ben Dowling

Reputation: 17541

Postgres copy query connection lost error

I'm trying to bulk load around 200M lines (3.5GB) of data to an Amazon RDS postgresql DB using the following command:

cat data.csv | psql -h<host>.rds.amazonaws.com -U<user> <db> -c "COPY table FROM STDIN  DELIMITER AS ','"

After a couple of minutes I get this error:

connection not open
connection to server was lost

If I run head -n 100000000 data.csv to send the first 100M lines instead of all 200M then the command succeeds instead. I'm guessing that there's a timeout somewhere that's causing the query with the full dataset to fail. I've not been able to find any timeout settings or parameters though.

How can I make the bulk insert succeed with the full dataset?

Upvotes: 3

Views: 2546

Answers (1)

Denis de Bernardy
Denis de Bernardy

Reputation: 78561

As I read the statement you're using, it basically creates a giant string, then connects to SQL and then it tries to feed the entire string as argument.

If you load psql and run something like \copy ... from '/path/to/data.csv' ..., I'd imagine the connection might stay alive while the file's content is streamed chunk by chunk.

That would be my hunch as to why 10M lines works (= argument pushed entirely before the connection times out) but not the entire file (= argument still uploading).

Upvotes: 3

Related Questions