Reputation: 1

Fast data moving from CSV to SQLite by Python

I have a problem. There are hundreds of CSV files, ca. 1,000,000 lines each one. I need to move that data in a specific way, but script working very slow (it passing few ten of tousands per hour).

My code:

    import sqlite3 as lite
    import csv
    import os



my_file = open('file.csv', 'r')
reader = csv.reader(my_file, delimiter=',')


date = '2014-09-29'

con = lite.connect('test.db', isolation_level = 'exclusive')
for row in reader:

    position = row[0]
    item_name = row[1]



    cur = con.cursor()
    cur.execute("CREATE TABLE IF NOT EXISTS [%s] (Date TEXT, Position INT)" % item_name)
    cur.execute("INSERT INTO [%s] VALUES(?, ?)" % item_name, (date, position))
con.commit()

I found an information saying about isolation_level and single accessing to database, but it didn't work well.

Lines CSV files have a structure: 1,item1 | 2,item2

Does anyone could to help me? Thanks!

Upvotes: 0

Answers (4)

Fuyan Lu

Reputation: 1

I have the same problem. Now it is solved! I would like to share the methods with everyone who is facing the same problem!

We use sqlite3 database as an example, and other databases may also work but are not sure. We adopt pandas and sqlites modules in python.

This can convert a list of csv files [file1,file2,...] into talbes [table1,table2,...] quickly.

import pandas as pd
import sqlite3 as sql


DataBasePath="C:\\Users\\...\\database.sqlite"
conn=sql.connect(DataBasePath)

filePath="C:\\Users\\...\\filefolder\\"    
datafiles=["file1","file2","file3",...]

for f in datafiles:
    df=pd.read_csv(filePath+f+".csv")        
    df.to_sql(name=f,con=conn,if_exists='append', index=False)    
conn.close()

What's more, this code can create database if it doesn't exist. The argument of pd.to_sql() 'if_exists' is important. Its value is "fail" as default, which will import data if it exists otherwise does nothing; "replace" will drop the table first if it exists then create new table and import data; "append" will import data if it exists otherwise creates a new one can import data.

Upvotes: 0

Valera Grishin

Reputation: 441

The code is inefficient in that it performs two SQL statements for each row in CSV. Try to optimize.

Is there a way to process CSV first and convert it to SQL statements?
Are rows in CSV grouped by tables (item name's)? If yes, you can accumulate the rows to be inserted into the same table (generate a set of INSERT statements for the same table) and only prefix the resulting set of statements with CREATE TABLE IF NOT EXISTS once, not every of them.
If possible, use bulk insert. If I get it right, bulk insert is introduced with SQLite v.3.27.1. More on this: Is it possible to insert multiple rows at a time in an SQLite database?
If needed, bulk insert in chunks. More on this: Bulk insert huge data into SQLite using Python

Upvotes: 0

jsbueno

Reputation: 110301

You certainly don't want to create a new cursor object for each row to insert - and checking for table creation at each line will certainly slow you down s well -

I'd suggest doing this in 2 passes: first you create the needed tables, on the second pass you record the data. If it is still slow, you could make a a more sophisticated in-memory collection of data to be inserted and perform "executemany" - but this would require some sophistication to group data by name in memory prior to comitting;.

import sqlite3 as lite
import csv
import os

my_file = open('file.csv', 'r')
reader = csv.reader(my_file, delimiter=',')

date = '2014-09-29'

con = lite.connect('test.db', isolation_level = 'exclusive')
cur  = con.cursor()

table_names = set(row[1] for row in reader)
my_file.seek(0)

for name in table_names:
     cur.execute("CREATE TABLE IF NOT EXISTS [%s] (Date TEXT, Position INT)" % item_name)

for row in reader:

    position = row[0]
    item_name = row[1]

    cur.execute("INSERT INTO [%s] VALUES(?, ?)" % item_name, (date, position))

con.commit()

Upvotes: 0

fukanchik

Reputation: 2865

Don't do sql inserts. Prepare CSV file first, then do:

.separator <separator>
.import <loadFile> <tableName>

See here: http://cs.stanford.edu/people/widom/cs145/sqlite/SQLiteLoad.html

Upvotes: 3

Fast data moving from CSV to SQLite by Python

Answers (4)

Related Questions