Reputation: 2311
I would like to ask why below statement gave an disk I/O message if commit was executed inside for loop ? This would not give an error if commit was executed outside the for loop.
Error message:
Traceback (most recent call last):
File "D:\Dropbox\Public\EBOOK\Python\Learn\SQLITE_DB\ParsedJSON\ParsedJSON.py"
, line 71, in <module>
conn.commit()
sqlite3.OperationalError: disk I/O error
Code
for entry in json_data:
name = entry[0];
title = entry[1];
role = entry[2];
print name, title, role
cur.execute('''INSERT OR IGNORE INTO User (name)
VALUES ( ? )''', ( name, ) )
cur.execute('SELECT id FROM User WHERE name = ? ', (name, ))
user_id = cur.fetchone()[0]
cur.execute('''INSERT OR IGNORE INTO Course (title)
VALUES ( ? )''', ( title, ) )
cur.execute('SELECT id FROM Course WHERE title = ? ', (title, ))
course_id = cur.fetchone()[0]
cur.execute('''INSERT OR REPLACE INTO Member
(user_id, course_id, role) VALUES ( ?, ?, ? )''',
( user_id, course_id, role ) )
conn.commit()
Complete code
import json
import sqlite3
conn = sqlite3.connect('rosterdb.sqlite')
cur = conn.cursor()
# Do some setup
cur.executescript('''
DROP TABLE IF EXISTS User;
DROP TABLE IF EXISTS Member;
DROP TABLE IF EXISTS Course;
CREATE TABLE User (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
name TEXT UNIQUE
);
CREATE TABLE Course (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
title TEXT UNIQUE
);
CREATE TABLE Member (
user_id INTEGER,
course_id INTEGER,
role INTEGER,
PRIMARY KEY (user_id, course_id)
)
''')
fname = raw_input('Enter file name: ')
if ( len(fname) < 1 ) : fname = 'roster_data.json'
str_data = open(fname).read()
json_data = json.loads(str_data)
for entry in json_data:
name = entry[0];
title = entry[1];
role = entry[2];
print name, title, role
cur.execute('''INSERT OR IGNORE INTO User (name)
VALUES ( ? )''', ( name, ) )
cur.execute('SELECT id FROM User WHERE name = ? ', (name, ))
user_id = cur.fetchone()[0]
cur.execute('''INSERT OR IGNORE INTO Course (title)
VALUES ( ? )''', ( title, ) )
cur.execute('SELECT id FROM Course WHERE title = ? ', (title, ))
course_id = cur.fetchone()[0]
cur.execute('''INSERT OR REPLACE INTO Member
(user_id, course_id, role) VALUES ( ?, ?, ? )''',
( user_id, course_id, role ) )
conn.commit()
Upvotes: 2
Views: 759
Reputation: 1724
I was curious how much impact the commit I/O actually had, so I ran a few tests with some mocked-up JSON matching the following format:
{
"folks": [
{
"name": "Foghorn Leghorn",
"title": "Principal",
"role": "Administration"
}
]
}
My results were enlightening. These are averaged times over 10 runs of the test:
with commit() outside loop: 8.960046267508 seconds for 50 Records
and
with commit() outside loop: 0.3031771421432 for 50 Records
I ran the test with ~100,000 records.
With the .commit() outside the loop:
15.2660858631 seconds for 102150 records
With the .commit() inside the loop:
23.81681369933333 MINUTES for 102150 records
Understand that with the .commit() outside the loop, this example also involves 102150 writes to disk. When the commit() is outside, it defers writing to your database until all of your operations are complete and buffered. Inside, it immediately writes to the database file after each iteration completes.
In addition, between each commit(), sqlite is creating a journal file (in this case, 'rosterdb.sqlite-journal'), so you're also creating and deleting this additional file for every commit(), which multiplies the impact on your hardware and your performance. (If you're curious, you can watch this file appear and disappear each iteration in whichever directory your database exists.)
So putting the commit() outside is way, way faster and easier on your hardware. As to why it is returning Disk I/O errors, I'd say it's tied to the speed and frequency of those commits().
Upvotes: 1