how to improve the performance of query with where and group by in sqlite

Question

There are about 2 million of records in a table called log right now. The query performance becomes not acceptable, but I don't want to split the table to different partition in current stage. Thus, I try to add some indexes in order to increase the query performance.

CREATE TABLE log
           (
                id Integer primary key autoincrement,
                app_id text,
                __key__id INTEGER,

                secret text,
                trace_code text,
                url text,

                action text,

                facebook_id text,
                ip text,

                tw_time timestamp,
                time timestamp,

                tag text,
                to_url text,

                from_url text,
                referer text,

                weight integer,
                Unique(app_id, __key__id)
            );
CREATE INDEX key1 on log (action, url, tag);

However, it looks like sqlite just ignore my index but scan the whole table. Did I miss anything?

sqlite> explain query plan select count(*) from log where action like 'content_%
';
0|0|0|SCAN TABLE log (~1182357 rows)


sqlite> explain query plan select count(*) from log where action like 'content_%
' group by url, tag;
0|0|0|SCAN TABLE log (~1182357 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY

EDIT1

@MaxSem Thanks, I tried and find out it become much better while I change the query to:

sqlite> explain query plan select count(*) from log indexed by key1 where action
 in ('content_click','content_mouseover', 'content_display');
0|0|0|SEARCH TABLE log USING COVERING INDEX key1 (action=?) (~886770 rows)
0|0|0|EXECUTE LIST SUBQUERY 1

However, I cannot explain the reason why Sqlite cannot handle original query.

EDIT2

I should change my question. Is there a way to speed up this kind of query in sqlite?

ddevienne · Accepted Answer

I believe that SQLite cannot use an indexed column for queries with a LIKE predicate, even with starts with LIKE queries. But you can emulate this kind of LIKE predicate with inequalities:

sqlite> create table t (action text, url text, tag text);
sqlite> insert into t values ('click', 'foo', 'bar');
sqlite> insert into t values ('clack', 'foo', 'bar');
sqlite> insert into t values ('clock', 'foo', 'bar');
sqlite> insert into t values ('cluck', 'foo', 'bar');
sqlite> insert into t values ('cleck', 'foo', 'bar');
sqlite> insert into t values ('clyck', 'foo', 'bar');
sqlite> create index t_index on t (action, url, tag);

With the starts with LIKE, you do get a full scan:

sqlite> explain query plan select count(*) from t where action like 'cl%';
0|0|0|SCAN TABLE t (~500000 rows)

But with inequalities, the index is used:

sqlite> explain query plan select count(*) from t where action >= 'cl' and action < 'cm';
0|0|0|SEARCH TABLE t USING COVERING INDEX t_index (action>? AND action

The caveat of this technique is that you must be careful in choosing lower bound ('cm' here) such that lexicographic ordering gives you what you expect, and this technique cannot model easily all LIKE predicates.

When you use an IN predicate with tokens to match exactly, then of course the index is used, since you areback to using an equality:

sqlite> explain query plan select count(*) from t where action in ('click', 'clack');
0|0|0|SEARCH TABLE t USING COVERING INDEX t_index (action=?) (~20 rows)
0|0|0|EXECUTE LIST SUBQUERY 1


If you want to still use LIKE, you can join your table with a pre-selected list of the action values you want from a sub-query:

sqlite> explain query plan select count(*) from t join (select distinct action from t where action like 'cl%') a where t.action = a.action;
1|0|0|SCAN TABLE t USING COVERING INDEX t_index (~500000 rows)
0|0|1|SCAN SUBQUERY 1 AS a (~500000 rows)
0|1|0|SEARCH TABLE t USING COVERING INDEX t_index (action=?) (~10 rows)


This basically gives you a dynamic IN predicate if you want, instead of listing all possible values explicitly.

Of course here, I'm extracting the list of actions which match the LIKE query from your big table, so that's a full scan by itself, negating all the benefit, but if you normalized your schema to have a separate (much smaller) action table with a small integer surrogate key that you'd use as your action column in your log table, then you'd avoid storing repeating strings many times (replaced by small integers) in your log table, and you'd be able to easily join them for your query.

how to improve the performance of query with where and group by in sqlite

EDIT1

EDIT2

Answers (2)

Related Questions