Mats Ekberg
Mats Ekberg

Reputation: 1675

Time for updating a sqlite3 index fluctuates too much

I have a large-ish sqlite3 (3.6.22) database (about 1 GB, 5 million rows) with a single table indexed on one column. The problem is that the time to do a typical INSERT transaction fluctuates widely. I insert about 10000 rows at a time (wrapped in a transaction of course). Often it takes about 1.5 seconds, but about every fifth transaction it suddenly takes several minutes for the very same transaction to complete. I've done a lot of experimentation, and I've discovered that the phenomena only occurs if there is an index, which makes me think it is updating the index which takes a lot of time.

I need more consistent performance. A bit higher average insertion times times would be ok, if I can only avoid that some transactions suddenly takes 200x as long as the previous one... What should I do?

Here's the schema. The strings in blocks.md5 are always exactly 32 bytes long and likely unique. The rolling.value column will contain very large 64-bit integers.

CREATE TABLE blocks (blob char(32) NOT NULL, 
                     offset long NOT NULL, 
                     md5 char(32) NOT NULL, 
                     row_md5 char(32));
CREATE TABLE rolling (value INT NOT NULL);

CREATE INDEX index_md5 ON blocks (md5);
CREATE UNIQUE INDEX index_rolling ON rolling (value);

Upvotes: 1

Views: 341

Answers (1)

Tom Kerr
Tom Kerr

Reputation: 10720

I don't know exactly how sqlite indexes are implemented, but I'd expect the behavior you describe if they were storing the index on disk or reordering the data.

Imagine a scenario where when they are allocating blocks for the index, they start some page with N slots for data. When the page fills up, they have to allocate another and split the data between them.

When you're inserting your data, the ordering of the MD5 will be as random as it gets, so every page will fill up independently. There isn't any reasonable way for the indexing strategy to know that.

Other databases will even recommend using different indexing strategies than normal for strings, especially in the case of something like random MD5s.

Trying to do this in an all memory database would tell you whether its algorithmic or disk access.

I've only really tried to avoid this in an offline system where I could sort data before inserting. After it was all inserted I would index it and that was as fast as I could find. If you're doing 10k at a time, that might be your use case, though I don't know.

Upvotes: 1

Related Questions