Reputation: 13
Struggling with this python issue as I'm new to it and I don't have significant experience in the language. I currently have a CSV file with containing around 20 headers and the same amount of rows so listing each out like some examples here is what I'm trying to avoid: https://www.dataquest.io/blog/loading-data-into-postgres/
My code consists of the following so far:
with open('dummy-data.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute('INSERT INTO messages VALUES', (row))
I'm getting a syntax error at the end of the input, so I assumed it is linked to the way my execute method has been written but I still don't know what I would do in order to address the issue. Any help?
P.S. I understand the person usings %s for that, but if that was the case, can it be avoided since I don't want to have it duplicated in a line 20 times.
Upvotes: 0
Views: 672
Reputation: 44137
If you want to have a single placeholder that covers an whole list of values, you can use a different method, located in "extras", which covers that usage:
psycopg2.extras.execute_values(cur, 'INSERT INTO messages VALUES %s', (row,))
This method can take many rows at a time (which is good for performance), which is why you need to wrap your single row in (...,)
.
Upvotes: 1
Reputation: 77902
Basically, you DO have to specify at least the required placeholders - and preferably the fields names too - in your query.
If it's a one-shot affair and you know which fields are in the CSV and in which order, then you simply hardcode them in the query ie
SQL = "insert into tablename(field1, field2, field21) values(%s, %s, %s)"
Ok, for 20 or so fields it gets quite boring, so you can also use a list of field names to generate the fieldnames part and the placeholders:
fields = ["field1", "field2", "field21"]
placeholders = ["%s"] * len(fields) # list multiplication, yes
SQL = "insert into tablename({}) values({})".format(", ".join(fields), ", ".join(placeholders))
If by chance the CSV header row contains the exact field names, you can also just use this row as value for fields
- but you have to trust the csv then.
NB: specifying the fields list in the query is not strictly required but it can protect you from possible issues with a malformed csv. Actually, unless you really trust the source (your csv), you should actively validate the incoming data before sending them to the database.
NB2:
%s is for strings I know but would it work the same for timestamps?
In this case, "%s"
is not used as a Python string format specifier but as a plain database query placeholder. The choice of the string format specifier here is really unfortunate as it creates a lot of confusion. Note that this is DB vendor specific though, some vendors use "?" instead which is much clearer IMHO (and you want to check your own db-api connector's doc for the correct plaeholder to use BTW).
And since it's not a string format specifier, it will work for any type and doesn't need to be quoted for strings, it's the db-api module's job to do proper formatting (including quoting etc) according to the db column's type.
While we're at it, by all means, NEVER directly use Python string formatting operations when passing values to your queries - unless you want your database to be open-bar for script-kiddies of course.
Upvotes: 1
Reputation: 1930
You can use the cursor and executemany so that you can skip the iteration , But its slower than string joining parameterized approach.
import pandas
df = pd.read_csv('dummy-data.csv')
df.columns = [<define the headers here>] # You can skip this line if headers match column names
try:
cursor.prepare("insert into public.messages(<Column Names>) values(:1, :2, :3 ,:4, :5)")
cursor.executemany(None, df.values.tolist())
conn.commit()
except:
conn.rollback()
Upvotes: -1
Reputation: 856
You can use strings multiply.
import csv
import psycopg2
conn = psycopg2.connect('postgresql://db_user:db_user_password@server_name:port/db_name')
cur = conn.cursor()
multiple_placehorders = ','.join(['%s']*20)
with open('dummy-data.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute('INSERT INTO public.messages VALUES (' + multiple_placehorders + ')', row)
conn.commit()
Upvotes: 1
Reputation: 7230
The problem lies on the insert itself:
cur.execute('INSERT INTO messages VALUES', (row))
The problem is that, since you are not defining parameters on the query, it is interpreting that you literally want to execute INSERT INTO messages VALUES
, with no parameters, which will cause a syntax error; using a single parameter won't work either, since it will understand that you want a single parameter, instead of multiple parameters.
If you want to create parameters in a more dynamic way, you could try to construct the query string dynamically.
Please, take a look the documentation: http://initd.org/psycopg/docs/cursor.html#cursor.execute
Upvotes: 1
Reputation: 164
Last time when I was struggling to insert a CSV data into the postgres I've used pgAdmin and it has worked. I don't know whether this answer is a solution but an easy idea to get along with it.
Upvotes: 0