Reputation: 11577
What's the easiest way to load a large csv file into a Postgres RDS database in AWS using Python?
To transfer data to a local postgres instance, I have previously used a psycopg2
connection to run SQL statements like:
COPY my_table FROM 'my_10gb_file.csv' DELIMITER ',' CSV HEADER;
However, when executing this against a remote AWS RDS database, this generates an error because the .csv
file is on my local machine rather than the database server:
ERROR: must be superuser to COPY to or from a file
SQL state: 42501
Hint: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.
This answer explains why this doesn't work.
I'm now looking for the Python syntax to automate this using psql
. I have a large number of .csv
files I need to upload, so I need a script to automate this.
Upvotes: 1
Views: 2114
Reputation: 11577
First you need to create the table definitions in the RDS Postgres as normal using CREATE TABLE
SQL statements.
Then you need to run a psql
statement like this:
psql -p 5432 --host YOUR_HOST --username YOUR_USERNAME --dbname YOUR_DBNAME --command "\copy my_table FROM 'my_10gb_file.csv' DELIMITER ',' CSV HEADER"
In Python, we can set this up and execute it as follows:
host = "YOUR_HOST"
username = "YOUR_USERNAME"
dbname = "YOUR_DBNAME"
table_name = "my_table"
file_name = "my_10gb_file.csv"
command = "\copy {} FROM '{}' DELIMITER ',' CSV HEADER".format(table_name, file_name)
psql_template = 'psql -p 5432 --host {} --username {} --dbname {} --command "{}"'
bash_command = psql_template.format(host, username, dbname, command.strip())
process = subprocess.Popen(bash_command, stdout=subprocess.PIPE, shell=True)
output, error = process.communicate()
Upvotes: 1