Rose_Trojan
Rose_Trojan

Reputation: 117

CSV read into MySQLdb failing

I am having a problem with reading my csv file into the MySQL database. I have tried a number of solutions, but the errors just keep changing and the code isn't working. This same code had worked with another csv file, so I'm thinking I might be doing something wrong with this one?

Here is my code

from database_access import *
from builtins import bytes, int, str
import codecs
import csv
import requests

from urllib.parse import urlparse, urljoin
from bs4 import BeautifulSoup
import re
import cgi
import MySQLdb
import chardet

# from database_access import *
import MySQLdb
import simplejson



if __name__ == '__main__':

    with open("SIMRA.csv",'r') as file:
        reader = csv.reader(file)
        #reader = csv.reader(text)
        next(reader, None)
        print ("project running")
        #print (row[7])
        #rowlist = []
        all_links = []
        all_project_ids = []


        for row in reader:
            if row[7] != "" and row[16] != "":
                    country = row[2]
                    city = row[8]
                    description = row[11] + '' + row[12]
                    title = row[7].replace("'", "''")
                    link = row[16]
                    #date_start = row[9]

                    #print a check here
                    print(title,description,country, city, link)

                    db = MySQLdb.connect(host, username, password, database, charset='utf8')
                    cursor = db.cursor()
                    new_project = True

                    proj_check = "SELECT * from Projects where ProjectName like '%" + title + "%'"
                    #proj_check = "SELECT * from Projects where ProjectName like %s",(title,)
                    #cur.execute("SELECT * FROM records WHERE email LIKE %s", (search,))
                    cursor.execute(proj_check)
                    num_rows = cursor.rowcount
                    if num_rows != 0:
                        new_project = False

                    url_compare = "SELECT * from Projects where ProjectWebpage like '" + link + "'"
                    #url_compare = "SELECT * from Projects where ProjectWebpage like %s",(link,)
                    cursor.execute(url_compare)
                    num_rows = cursor.rowcount
                    if num_rows != 0:
                        new_project = False

                    if new_project:
                        project_insert = "Insert into Projects (ProjectName,ProjectWebpage,FirstDataSource,DataSources_idDataSources) VALUES (%s,%s,%s,%s)"
                        cursor.execute(project_insert, (title, link,'SIMRA', 5))
                        projectid = cursor.lastrowid
                        print(projectid)



                        #ashoka_projectids.append(projectid)
                        db.commit()

                        ins_desc = "Insert into AdditionalProjectData (FieldName,Value,Projects_idProjects,DateObtained) VALUES (%s,%s,%s,NOW())"
                        cursor.executemany(ins_desc, ("Description", description, str(projectid)))
                        db.commit()

                        ins_location = "Insert into ProjectLocation (Type,Country,City,Projects_idProjects) VALUES (%s,%s,%s,%s)"
                        cursor.execute(ins_location, ("Main", country,city, str(projectid)))
                        db.commit()


                    else:

                        print('Project already exists!')
                        print(title)


                    all_links.append(link)

    #print out SIMRA's links to a file for crawling later
    with open('simra_links', 'w', newline='') as f:
        write = csv.writer(f)
        for row in all_links:
            columns = [c.strip() for c in row.strip(', ').split(',')]
            write.writerow(columns)

When I ran this, I got the following error:

File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 898: invalid start byte

I did some research and tried handling the encoding error by adding different forms of encoding, as seen here - UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xa5 in position 0: invalid start byte, and Python MySQLdb TypeError: not all arguments converted during string formatting. Added this in this in the csv open parameter -

with open("SIMRA.csv", 'r', encoding="cp437", errors='ignore') as file:

Running the code with these different encoding options came up with a different error:

MySQLdb._exceptions.ProgrammingError: not all arguments converted during bytes formatting

Further research suggested using tuples or lists in order to address this problem, so I added these in the 'select' function in the code, as suggested here - Python MySQLdb TypeError: not all arguments converted during string formatting and in the Python SQL documentation here - PythonMySqldb

So the select query became:

                   
                    proj_check = "SELECT * from Projects where ProjectName like %s",(title,)
                    cursor.execute(proj_check)
                    num_rows = cursor.rowcount
                    if num_rows != 0:
                        new_project = False

                    
                    url_compare = "SELECT * from Projects where ProjectWebpage like %s",(link,)
                    cursor.execute(url_compare)
                    num_rows = cursor.rowcount
                    if num_rows != 0:
                        new_project = False

When I ran the code, I came up with this Assertion Error and I have no idea what to do anymore.

File "/home/ros/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 205, in execute assert isinstance(query, (bytes, bytearray)) AssertionError

I have run out of ideas. It might be that I'm missing something small, but I can't figure this out now as I've been battling with this for two days now.

Can anyone help point out what I'm missing? It will be greatly appreciated. This code ran perfectly with another csv file. I am running this with Python 3.8 btw.

Upvotes: 2

Views: 87

Answers (2)

Rick James
Rick James

Reputation: 142528

Were you expecting £? You need to specify what the encoding of the file is. It may be "latin1". See the syntax of LOAD DATA for how to specify CHARACTER SET latin1.

Upvotes: 0

Rose_Trojan
Rose_Trojan

Reputation: 117

Have solved this now. I had to use a different encoding with the original code and this solved the problem. So, I changed the csv open parameter to:

with open("SIMRA.csv",'r', encoding="ISO-8859-1") as file:
    reader = csv.reader(file)

Upvotes: 1

Related Questions