noob
noob

Reputation: 9212

connection.commit() performance impact

When parsing huge log files into a sqlite database is there any major performance improve if we don't call connection.commit() after each insert or does that not make any difference. I guess the question is if it's only for separating transactions or if it does anything that needs time.

Upvotes: 1

Views: 1189

Answers (1)

Alex Martelli
Alex Martelli

Reputation: 882171

Yes, performance decreases directly with more frequent commits.

Here's a toy example working with a "lorem ipsum" text file of about 1400 lines:

import argparse
import sqlite3
import textwrap
import time

parser = argparse.ArgumentParser()
parser.add_argument("n", help="num lines per commit", type=int)
arg = parser.parse_args()

con = sqlite3.connect('lorem.db')
cur = con.cursor()
cur.execute('drop table if exists Lorem')
cur.execute('create table Lorem (lorem STRING)')
con.commit()

with open('lorem.txt') as f: lorem=textwrap.wrap(f.read())
print('{} lines'.format(len(lorem)))

start = time.time()
for i, line in enumerate(lorem):
    cur.execute('INSERT INTO Lorem(lorem) VALUES(?)', (line,))
    if i % arg.n == 0: con.commit()
stend = time.time()

print('{} lines/commit: {:.2f}'.format(arg.n, stend-start))

With this saved as sq.py...:

$ for i in `seq 1 10`; do python sq.py $i; done
1413 lines
1 lines/commit: 1.01
1413 lines
2 lines/commit: 0.53
1413 lines
3 lines/commit: 0.35
1413 lines
4 lines/commit: 0.27
1413 lines
5 lines/commit: 0.21
1413 lines
6 lines/commit: 0.19
1413 lines
7 lines/commit: 0.17
1413 lines
8 lines/commit: 0.14
1413 lines
9 lines/commit: 0.13
1413 lines
10 lines/commit: 0.11

So, halving the commits nearly halves elapsed time for the operation, and so on -- it's not quite linear but almost so.

For completeness: 100 lines per commit reduce runtime to 0.02.

Riffing on this, you can easily experiment, and measure times, with DB tables closer to what you actually require.

Upvotes: 3

Related Questions