okkadu
okkadu

Reputation: 45

Load csv with Python into Oracle

I'm trying to load a 1 GB size csv file (about 1 Million rows) into Oracle table, the code works fine as below:

import cx_Oracle 
import csv

connection1=cx_Oracle.connect("username", "password","ORADB")
cur1 = connection1.cursor()
with open("Test.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)
i=0
sql1="""insert into TEST_CSV_FILE ( START_TS, END_TS,FLDR, TSK, INST, SRC_ERR,tgt_ERR) 
        values (:1, :2, :3, :4, :5, :6, :7)"""
list=[]
for lines1 in csv_reader:
    print(lines1)
    print(type(lines1))
    cur1.execute(sql1,lines1)
    i=i+1
    print(i)
cur1.close()
connection1.commit()
connection1.close()

However the performance has been horrible with load not completing even after 2 hours, is there an efficient way to load this file in quick time ? I went through documentation and tried with "executemany" instead of "execute" but I get below when I try with executemany:

    cur1.executemany(sql1,lines1)
cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number

Any help would be greatly appreciated.

After help from Christopher and abhlb I updated code as below to batch and use executemany, however the performance is still bad with Job taking 2 hours with batch size of 3000, any enhancements I can make to improve the performance ?

import cx_Oracle
import csv
from datetime import datetime
start=datetime.now()
connection1=cx_Oracle.connect("username", "password","oradb")
cur1 = connection1.cursor()
with open("test_file.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
i=0
batch_size=3000
sql1="""insert into TEST_CSV_FILE ( START_TS, END_TS,FLDR, TSK, INST, 
SRC_ERR,tgt_ERR) 
    values (:1, :2, :3, :4, :5, :6, :7)"""
data = []
for line in csv_reader:
    data.append((line[0],line[1],line[2],line[3],line[4],line[5],line[6]))
    if len(data) % batch_size == 0:
        cur1.executemany(sql1, data)
        data = []
if data:
    cur1.executemany(sql1, data)
connection1.commit()
cur1.close()
connection1.close()
print(datetime.now()-start)

Upvotes: 0

Views: 815

Answers (1)

abhilb
abhilb

Reputation: 5757

You need to convert each row into a tuple of values

dataToInsert=[tuple(x) for x in csv_reader]
cur1.executemany(sql1,dataToInsert)
cur1.close()
connection1.commit()
connection1.close()

Upvotes: 1

Related Questions