connection.commit() performance impact

Question

When parsing huge log files into a sqlite database is there any major performance improve if we don't call connection.commit() after each insert or does that not make any difference. I guess the question is if it's only for separating transactions or if it does anything that needs time.

Alex Martelli · Accepted Answer

Yes, performance decreases directly with more frequent commits.

Here's a toy example working with a "lorem ipsum" text file of about 1400 lines:

import argparse
import sqlite3
import textwrap
import time

parser = argparse.ArgumentParser()
parser.add_argument("n", help="num lines per commit", type=int)
arg = parser.parse_args()

con = sqlite3.connect('lorem.db')
cur = con.cursor()
cur.execute('drop table if exists Lorem')
cur.execute('create table Lorem (lorem STRING)')
con.commit()

with open('lorem.txt') as f: lorem=textwrap.wrap(f.read())
print('{} lines'.format(len(lorem)))

start = time.time()
for i, line in enumerate(lorem):
    cur.execute('INSERT INTO Lorem(lorem) VALUES(?)', (line,))
    if i % arg.n == 0: con.commit()
stend = time.time()

print('{} lines/commit: {:.2f}'.format(arg.n, stend-start))

With this saved as sq.py...:

$ for i in `seq 1 10`; do python sq.py $i; done
1413 lines
1 lines/commit: 1.01
1413 lines
2 lines/commit: 0.53
1413 lines
3 lines/commit: 0.35
1413 lines
4 lines/commit: 0.27
1413 lines
5 lines/commit: 0.21
1413 lines
6 lines/commit: 0.19
1413 lines
7 lines/commit: 0.17
1413 lines
8 lines/commit: 0.14
1413 lines
9 lines/commit: 0.13
1413 lines
10 lines/commit: 0.11

So, halving the commits nearly halves elapsed time for the operation, and so on -- it's not quite linear but almost so.

For completeness: 100 lines per commit reduce runtime to 0.02.

Riffing on this, you can easily experiment, and measure times, with DB tables closer to what you actually require.

connection.commit() performance impact

Answers (1)

Related Questions