user664481
user664481

Reputation: 2311

Python Commit SQL Statment return disk/IO error

I would like to ask why below statement gave an disk I/O message if commit was executed inside for loop ? This would not give an error if commit was executed outside the for loop.

Error message:

Traceback (most recent call last):
File "D:\Dropbox\Public\EBOOK\Python\Learn\SQLITE_DB\ParsedJSON\ParsedJSON.py"
, line 71, in <module>
conn.commit()
sqlite3.OperationalError: disk I/O error

Code

for entry in json_data:

  name = entry[0];
  title = entry[1];
  role = entry[2];

  print name, title, role

  cur.execute('''INSERT OR IGNORE INTO User (name) 
    VALUES ( ? )''', ( name, ) )
  cur.execute('SELECT id FROM User WHERE name = ? ', (name, ))
  user_id = cur.fetchone()[0]

  cur.execute('''INSERT OR IGNORE INTO Course (title) 
    VALUES ( ? )''', ( title, ) )
  cur.execute('SELECT id FROM Course WHERE title = ? ', (title, ))
  course_id = cur.fetchone()[0]

  cur.execute('''INSERT OR REPLACE INTO Member
    (user_id, course_id, role) VALUES ( ?, ?, ? )''', 
    ( user_id, course_id, role ) )

  conn.commit()

Complete code

import json
import sqlite3

conn = sqlite3.connect('rosterdb.sqlite')
cur = conn.cursor()

# Do some setup
cur.executescript('''
DROP TABLE IF EXISTS User;
DROP TABLE IF EXISTS Member;
DROP TABLE IF EXISTS Course;

CREATE TABLE User (
id     INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
name   TEXT UNIQUE
);

CREATE TABLE Course (
id     INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
title  TEXT UNIQUE
);

CREATE TABLE Member (
user_id     INTEGER,
course_id   INTEGER,
role        INTEGER,
PRIMARY KEY (user_id, course_id)
)
''')

fname = raw_input('Enter file name: ')
if ( len(fname) < 1 ) : fname = 'roster_data.json'

str_data = open(fname).read()
json_data = json.loads(str_data)

for entry in json_data:

  name = entry[0];
  title = entry[1];
  role = entry[2];

  print name, title, role

  cur.execute('''INSERT OR IGNORE INTO User (name) 
    VALUES ( ? )''', ( name, ) )
  cur.execute('SELECT id FROM User WHERE name = ? ', (name, ))
user_id = cur.fetchone()[0]

  cur.execute('''INSERT OR IGNORE INTO Course (title) 
    VALUES ( ? )''', ( title, ) )
  cur.execute('SELECT id FROM Course WHERE title = ? ', (title, ))
  course_id = cur.fetchone()[0]

  cur.execute('''INSERT OR REPLACE INTO Member
    (user_id, course_id, role) VALUES ( ?, ?, ? )''', 
    ( user_id, course_id, role ) )

  conn.commit()

Upvotes: 2

Views: 759

Answers (1)

Chris Larson
Chris Larson

Reputation: 1724

I was curious how much impact the commit I/O actually had, so I ran a few tests with some mocked-up JSON matching the following format:

{
  "folks": [
    {
      "name": "Foghorn Leghorn", 
      "title": "Principal", 
      "role": "Administration"
    }
 ]

}

My results were enlightening. These are averaged times over 10 runs of the test:

with commit() outside loop: 8.960046267508 seconds for 50 Records

and

with commit() outside loop: 0.3031771421432 for 50 Records

I ran the test with ~100,000 records.

With the .commit() outside the loop:

15.2660858631 seconds for 102150 records

With the .commit() inside the loop:

23.81681369933333 MINUTES for 102150 records

Understand that with the .commit() outside the loop, this example also involves 102150 writes to disk. When the commit() is outside, it defers writing to your database until all of your operations are complete and buffered. Inside, it immediately writes to the database file after each iteration completes.

In addition, between each commit(), sqlite is creating a journal file (in this case, 'rosterdb.sqlite-journal'), so you're also creating and deleting this additional file for every commit(), which multiplies the impact on your hardware and your performance. (If you're curious, you can watch this file appear and disappear each iteration in whichever directory your database exists.)

So putting the commit() outside is way, way faster and easier on your hardware. As to why it is returning Disk I/O errors, I'd say it's tied to the speed and frequency of those commits().

Upvotes: 1

Related Questions