Robert McGrabby
Robert McGrabby

Reputation: 13

Inserting data from a CSV file to postgres using SQL

Struggling with this python issue as I'm new to it and I don't have significant experience in the language. I currently have a CSV file with containing around 20 headers and the same amount of rows so listing each out like some examples here is what I'm trying to avoid: https://www.dataquest.io/blog/loading-data-into-postgres/

My code consists of the following so far:

  with open('dummy-data.csv', 'r') as f:
        reader = csv.reader(f)
        next(reader)
        for row in reader:
            cur.execute('INSERT INTO messages VALUES', (row))

I'm getting a syntax error at the end of the input, so I assumed it is linked to the way my execute method has been written but I still don't know what I would do in order to address the issue. Any help?

P.S. I understand the person usings %s for that, but if that was the case, can it be avoided since I don't want to have it duplicated in a line 20 times.

Upvotes: 0

Views: 672

Answers (6)

jjanes
jjanes

Reputation: 44137

If you want to have a single placeholder that covers an whole list of values, you can use a different method, located in "extras", which covers that usage:

psycopg2.extras.execute_values(cur, 'INSERT INTO messages VALUES %s', (row,))

This method can take many rows at a time (which is good for performance), which is why you need to wrap your single row in (...,).

Upvotes: 1

bruno desthuilliers
bruno desthuilliers

Reputation: 77902

Basically, you DO have to specify at least the required placeholders - and preferably the fields names too - in your query.

If it's a one-shot affair and you know which fields are in the CSV and in which order, then you simply hardcode them in the query ie

SQL =  "insert into tablename(field1, field2, field21) values(%s, %s, %s)"

Ok, for 20 or so fields it gets quite boring, so you can also use a list of field names to generate the fieldnames part and the placeholders:

fields = ["field1", "field2", "field21"]
placeholders = ["%s"] * len(fields) # list multiplication, yes

SQL = "insert into tablename({}) values({})".format(", ".join(fields), ", ".join(placeholders))

If by chance the CSV header row contains the exact field names, you can also just use this row as value for fields - but you have to trust the csv then.

NB: specifying the fields list in the query is not strictly required but it can protect you from possible issues with a malformed csv. Actually, unless you really trust the source (your csv), you should actively validate the incoming data before sending them to the database.

NB2:

%s is for strings I know but would it work the same for timestamps?

In this case, "%s" is not used as a Python string format specifier but as a plain database query placeholder. The choice of the string format specifier here is really unfortunate as it creates a lot of confusion. Note that this is DB vendor specific though, some vendors use "?" instead which is much clearer IMHO (and you want to check your own db-api connector's doc for the correct plaeholder to use BTW).

And since it's not a string format specifier, it will work for any type and doesn't need to be quoted for strings, it's the db-api module's job to do proper formatting (including quoting etc) according to the db column's type.

While we're at it, by all means, NEVER directly use Python string formatting operations when passing values to your queries - unless you want your database to be open-bar for script-kiddies of course.

Upvotes: 1

redhatvicky
redhatvicky

Reputation: 1930

You can use the cursor and executemany so that you can skip the iteration , But its slower than string joining parameterized approach.

import pandas
df = pd.read_csv('dummy-data.csv')
df.columns = [<define the headers  here>] # You can skip this line if headers match column names
try:
    cursor.prepare("insert into public.messages(<Column Names>) values(:1, :2, :3 ,:4, :5)")
    cursor.executemany(None, df.values.tolist())
    conn.commit()
except:
    conn.rollback()

Upvotes: -1

Jefferson Houp
Jefferson Houp

Reputation: 856

You can use strings multiply.

import csv
import psycopg2

conn = psycopg2.connect('postgresql://db_user:db_user_password@server_name:port/db_name')
cur = conn.cursor()

multiple_placehorders = ','.join(['%s']*20)
with open('dummy-data.csv', 'r') as f:
    reader = csv.reader(f)
    next(reader)
    for row in reader:
        cur.execute('INSERT INTO public.messages VALUES (' + multiple_placehorders + ')', row)

conn.commit()

Upvotes: 1

Haroldo_OK
Haroldo_OK

Reputation: 7230

The problem lies on the insert itself:

 cur.execute('INSERT INTO messages VALUES', (row))

The problem is that, since you are not defining parameters on the query, it is interpreting that you literally want to execute INSERT INTO messages VALUES, with no parameters, which will cause a syntax error; using a single parameter won't work either, since it will understand that you want a single parameter, instead of multiple parameters.

If you want to create parameters in a more dynamic way, you could try to construct the query string dynamically.

Please, take a look the documentation: http://initd.org/psycopg/docs/cursor.html#cursor.execute

Upvotes: 1

Kartheek
Kartheek

Reputation: 164

Last time when I was struggling to insert a CSV data into the postgres I've used pgAdmin and it has worked. I don't know whether this answer is a solution but an easy idea to get along with it.

Upvotes: 0

Related Questions