Reputation: 1
I have a problem. There are hundreds of CSV files, ca. 1,000,000 lines each one. I need to move that data in a specific way, but script working very slow (it passing few ten of tousands per hour).
My code:
import sqlite3 as lite
import csv
import os
my_file = open('file.csv', 'r')
reader = csv.reader(my_file, delimiter=',')
date = '2014-09-29'
con = lite.connect('test.db', isolation_level = 'exclusive')
for row in reader:
position = row[0]
item_name = row[1]
cur = con.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS [%s] (Date TEXT, Position INT)" % item_name)
cur.execute("INSERT INTO [%s] VALUES(?, ?)" % item_name, (date, position))
con.commit()
I found an information saying about isolation_level and single accessing to database, but it didn't work well.
Lines CSV files have a structure: 1,item1 | 2,item2
Does anyone could to help me? Thanks!
Upvotes: 0
Views: 4009
Reputation: 1
I have the same problem. Now it is solved! I would like to share the methods with everyone who is facing the same problem!
We use sqlite3 database as an example, and other databases may also work but are not sure. We adopt pandas and sqlites modules in python.
This can convert a list of csv files [file1,file2,...] into talbes [table1,table2,...] quickly.
import pandas as pd
import sqlite3 as sql
DataBasePath="C:\\Users\\...\\database.sqlite"
conn=sql.connect(DataBasePath)
filePath="C:\\Users\\...\\filefolder\\"
datafiles=["file1","file2","file3",...]
for f in datafiles:
df=pd.read_csv(filePath+f+".csv")
df.to_sql(name=f,con=conn,if_exists='append', index=False)
conn.close()
What's more, this code can create database if it doesn't exist. The argument of pd.to_sql() 'if_exists' is important. Its value is "fail" as default, which will import data if it exists otherwise does nothing; "replace" will drop the table first if it exists then create new table and import data; "append" will import data if it exists otherwise creates a new one can import data.
Upvotes: 0
Reputation: 441
The code is inefficient in that it performs two SQL statements for each row in CSV. Try to optimize.
item name
's)? If yes, you can accumulate the rows to be inserted into the same table (generate a set of INSERT
statements for the same table) and only prefix the resulting set of statements with CREATE TABLE IF NOT EXISTS
once, not every of them.Upvotes: 0
Reputation: 110301
You certainly don't want to create a new cursor object for each row to insert - and checking for table creation at each line will certainly slow you down s well -
I'd suggest doing this in 2 passes: first you create the needed tables, on the second pass you record the data. If it is still slow, you could make a a more sophisticated in-memory collection of data to be inserted and perform "executemany" - but this would require some sophistication to group data by name in memory prior to comitting;.
import sqlite3 as lite
import csv
import os
my_file = open('file.csv', 'r')
reader = csv.reader(my_file, delimiter=',')
date = '2014-09-29'
con = lite.connect('test.db', isolation_level = 'exclusive')
cur = con.cursor()
table_names = set(row[1] for row in reader)
my_file.seek(0)
for name in table_names:
cur.execute("CREATE TABLE IF NOT EXISTS [%s] (Date TEXT, Position INT)" % item_name)
for row in reader:
position = row[0]
item_name = row[1]
cur.execute("INSERT INTO [%s] VALUES(?, ?)" % item_name, (date, position))
con.commit()
Upvotes: 0
Reputation: 2865
Don't do sql inserts. Prepare CSV file first, then do:
.separator <separator>
.import <loadFile> <tableName>
See here: http://cs.stanford.edu/people/widom/cs145/sqlite/SQLiteLoad.html
Upvotes: 3