Reputation: 45
I'm trying to load a 1 GB size csv file (about 1 Million rows) into Oracle table, the code works fine as below:
import cx_Oracle
import csv
connection1=cx_Oracle.connect("username", "password","ORADB")
cur1 = connection1.cursor()
with open("Test.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)
i=0
sql1="""insert into TEST_CSV_FILE ( START_TS, END_TS,FLDR, TSK, INST, SRC_ERR,tgt_ERR)
values (:1, :2, :3, :4, :5, :6, :7)"""
list=[]
for lines1 in csv_reader:
print(lines1)
print(type(lines1))
cur1.execute(sql1,lines1)
i=i+1
print(i)
cur1.close()
connection1.commit()
connection1.close()
However the performance has been horrible with load not completing even after 2 hours, is there an efficient way to load this file in quick time ? I went through documentation and tried with "executemany" instead of "execute" but I get below when I try with executemany:
cur1.executemany(sql1,lines1)
cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number
Any help would be greatly appreciated.
After help from Christopher and abhlb I updated code as below to batch and use executemany, however the performance is still bad with Job taking 2 hours with batch size of 3000, any enhancements I can make to improve the performance ?
import cx_Oracle
import csv
from datetime import datetime
start=datetime.now()
connection1=cx_Oracle.connect("username", "password","oradb")
cur1 = connection1.cursor()
with open("test_file.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
i=0
batch_size=3000
sql1="""insert into TEST_CSV_FILE ( START_TS, END_TS,FLDR, TSK, INST,
SRC_ERR,tgt_ERR)
values (:1, :2, :3, :4, :5, :6, :7)"""
data = []
for line in csv_reader:
data.append((line[0],line[1],line[2],line[3],line[4],line[5],line[6]))
if len(data) % batch_size == 0:
cur1.executemany(sql1, data)
data = []
if data:
cur1.executemany(sql1, data)
connection1.commit()
cur1.close()
connection1.close()
print(datetime.now()-start)
Upvotes: 0
Views: 815
Reputation: 5757
You need to convert each row into a tuple
of values
dataToInsert=[tuple(x) for x in csv_reader]
cur1.executemany(sql1,dataToInsert)
cur1.close()
connection1.commit()
connection1.close()
Upvotes: 1