Reputation: 11
I am new to the cloud and to data engineering as well.
I have a large csv file stored in a GCS bucket. I would like to write a python script to bulk-insert the data into a postgresql database on my local machine using a COPY statement. I cannot figure out the authentication though.
I would like to do something like this:
import psycopg2
conn = psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port)
cursor = conn.cursor()
file = 'https://storage.cloud.google.com/<my_project>/<my_file.csv>'
sql_query = f"COPY <MY_TABLE> FROM {file} WITH CSV"
cursor.execute(sql_query)
conn.commit()
conn.close()
I get this error message:
psycopg2.errors.UndefinedFile: could not open file "https://storage.cloud.google.com/<my_project>/<my_file.csv>" for reading: No such file or directory HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
The same happens when I run the query in psql.
I assume the problem is in authentication. I have set up Application Default Credentials with Google Cloud CLI and when acting like the authenticated user, I can easily download the file using wget. When I switch to postgres user, I get "access denied" error.
The ADC seem to work only with client libraries and command-line tools.
I use Ubuntu 22.04.1 LTS.
Thanks for any help.
Upvotes: 0
Views: 169
Reputation: 22893
This is not going to work for you. The file will need to be in a location permitted to the server process and also not fetched over http (it's a local file path it is expecting).
You can supply a program/script that will fetch the file for you and print it to STDOUT which the server can consume.
Or - do what the error message suggests and handle it locally with psycopg's copy support.
Upvotes: 2