Reputation: 2568
I have a python import script that imports a CSV. It takes around 45 seconds to import 20k rows into a mysql db. The problem is that now I have a huge file (4 million rows), and it is taking like 200 seconds to import those same 20k rows when it reaches the line ~200k of the file.
I dont know if thats some DB issue or the file reading itself... here is my script:
import csv
import mysql.connector
from mysql.connector import Error
import argparse
from itertools import islice
parser = argparse.ArgumentParser()
parser.add_argument('file', type=argparse.FileType('r'))
args = parser.parse_args()
def parseData(row):
sql = """INSERT INTO TABLE
(column1,column2)
VALUES( '%s','%s');""" % (
row['col1'],row['col2'],)
return sql
def dataImport(filename,connection,cursor):
data = csv.DictReader(filename,delimiter=';')
i = 0
for row in data:
sql = parseData(row)
try:
result = cursor.execute(sql)
pass
except Error as e:
pass
if i%20000 == 0:
connection.commit()
i=i+1
def importa(file):
try:
connection = mysql.connector.connect(host=host,
database=db,
user=user,
password=password)
if connection.is_connected():
db_Info = connection.get_server_info()
print("Connected to MySQL Server version ", db_Info)
cursor = connection.cursor()
dataImport(file,connection,cursor)
cursor.close()
except Error as e:
print("Error while connecting to MySQL", e)
if __name__ == "__main__":
importa(args.file)
Upvotes: 1
Views: 868
Reputation: 3807
Using executemany
to do single inserts of 20K rows instead of 20K individual INSERT
statements will optimize the transfer, but the database itself could also be a factor in the slow performance. There is a code example just a bit past the middle of this page.
Based on our exchange of comments, the database is the ultimate culprit in the slow performance. One idea that I've seen is to disable indexing until after everything is loaded.
ALTER TABLE table DISABLE KEYS;
... executemany called a bunch of times...
ALTER TABLE table ENABLE KEYS;
That might be worth a try.
Upvotes: 1