Aleks
Aleks

Reputation: 418

Insert data from csv to postgreSQL database via Python

I'm brand new to postgreSQL or SQL at all. I'm trying to create a table in a database via Python and then load data from a .csv file into the table. My code looks like this:

import csv
import psycopg2

#Establish connection to database 
con = psycopg2.connect(
        host = "localhost",
        database = "kundeavgang",
        user = "postgres",
        password = "postgres",
        )

#Cursor 
cur = con.cursor()

#If a mistake is made, start from scratch
cur.execute("DROP TABLE IF EXISTS kundeavgang")

#Create table
cur.execute('''
            CREATE TABLE "kundeavgang"(
            "customerID" TEXT,
            "gender" TEXT,
            "SeniorCitizen" TEXT,
            "Partner" TEXT,
            "Dependents" TEXT,
            "tenure" INT,
            "PhoneService" TEXT,
            "MultipleLines" TEXT,
            "InternetService" TEXT,
            "OnlineSecurity" TEXT,
            "DeviceProtection" TEXT,
            "TechSupport" TEXT,
            "StreamingMovies" TEXT,
            "Contract" TEXT,
            "PaperlessBilling" TEXT,
            "PaymentMethod" TEXT,
            "MonthlyCharges" FLOAT,
            "TotalCharges" FLOAT,
            "Churn" TEXT
            )
            ''')

#Acsess .csv file
with open('kundeavgang.csv') as csvFile:
    reader = csv.reader(csvFile)
    skipHeader = next(reader) #Account for header
    for row in reader: 
        customerID = row[0]
        gender = row[1]
        SeniorCitizen = row[2]
        Partner = row[3]
        Dependents = row[4]
        tenure = row[5]
        PhoneService = row[6]
        MultipleLines = row[7]
        InternetService = row[8]
        OnlineSecurity = row[9]
        OnlineBackup = row[10]
        DeviceProtection = row[11]
        TechSupport = row[12]
        StreamingTV = [13]
        StreamingMovies = row[14]
        Contract = row[15]
        PaperlessBilling = row[16]
        PaymentMethod = row[17]
        MonthlyCharges = row[18]
        TotalCharges = row[19]
        Churn = row[20]
        cur.execute('''INSERT INTO kundeavgang(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
            VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))


#Commit the transaction
con.commit()

#End connection
con.close()

In pgAdmin, the table comes up as existing in the database. However, I cannot find the actual table. Further, I have no idea about this line of code:

cur.execute('''INSERT INTO kundeavgang(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
            VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))

What does all the %s stand for? I found it off an online example which was not very helpful, so I tried it without knowing what it means. I have seen some examples where question marks are inserted instead, but also this without explanation.

Lastly, as the code stands now, I get the error message:

VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))

IndexError: tuple index out of range

All help or explanations will be appreciated.

Upvotes: 1

Views: 4087

Answers (4)

Joao Paulo
Joao Paulo

Reputation: 1

Change the line reader = csv.reader(csvFile) to:

reader = csv.reader(csvFile, delimiter=';')

Upvotes: 0

Parfait
Parfait

Reputation: 107567

For bulk inserts from text files, consider copy_from or copy_expert of psycopg2. Also, be sure to commit your execution:

cur.execute("DROP TABLE IF EXISTS kundeavgang")
con.commit()

cur.execute('''CREATE TABLE "kundeavgang" ... ''')
con.commit()

with open('kundeavgang.csv') as csvFile:
   next(csvFile) # SKIP HEADERS
   cur.copy_from(csvFile, "kundeavgang", sep=",")

   # POSTGRES COPY COMMAND FOR CSV MODE
   # cur.copy_expert("""COPY "kundeavgang" FROM STDIN WITH CSV""", csvFile)
   con.commit()

Upvotes: 3

snakecharmerb
snakecharmerb

Reputation: 55600

The problem is a mismatch between the number of columns to be populated and the length of the list provided. This is an easy mistake to make when dealing with a lot of columns. One way to reduce risk of error is to use the length of the columns or values list to build the statement.

cols = [name1, name2,...]
vals = [val1, val2, ...]
assert len(cols) == len(vals), 'mismatch between number of columns and number of values'
template = """INSERT INTO tbl ({}) VALUES ({})"""
stmt = template.format(', '.join(cols), ','.join(['%s'] * len(vals)))
cur.execute(stmt, vals)

Note that when building the column names dynamically it's good practice to quote them - psycopg2 provides tools for this.

Upvotes: 2

Gabio
Gabio

Reputation: 9484

The %s are placeholders for the values that will be inserted and passed through the following tuple:

(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)

The problem that your insert statement going to insert to 20 columns, you provide 20 values in your tuple but you have 22 placeholders (%s).

Upvotes: 3

Related Questions