Why this subquery is 10x faster than inner join

Question

I have two tables ecdict and favourite and one view which just inner join two tables.

CREATE TABLE ecdict('_id' INTEGER PRIMARY KEY, 'word' VARCHAR NOT NULL UNIQUE, 'meaning' VARCHAR, 'phonetic_string' VARCHAR, 'example' VARCHAR);
CREATE TABLE favourite('_id' INTEGER PRIMARY KEY, 'word' VARCHAR NOT NULL UNIQUE);

CREATE VIEW favourite_word as select ecdict.* from ecdict INNER JOIN favourite ON ecdict.word = favourite.word;

Here is the execution time of result:
Time: 0.010

select ecdict.* from ecdict where word in (select word from favourite);

Time: 0.226

select * from favourite_word;

Why they differ so much? It if is related to query plan, why sqlite chooses a slower one? How can I guide sqlite to choose the faster one? Thanks

Gordon Linoff · Accepted Answer

You need to look at the query plan to see what is really happening. The following is (informed) speculation.

When you have an inner join with two sets of indexes, you basically have two possible query plans:

Scan the edict table and look up words in the favourites using the index on favourite.word.
Scan the favourite table and look up words in edicts using the index on edicts(word). Then look up the record in edit when there is a match.

(A unique contraint creates an index.)

All these being equal -- that is, with no information about table sizes -- the first approach would be preferred in this case. Why? Because there is no additional step to look up the additional columns.

When you phrase the query using in, I think that SQLite will always use the first approach. In this case, I am guessing that the favourite table is much smaller than the edict table, so the second approach is actually better. When you write the query as a join, both possibilities are considered and the better one is chosen.

Why this subquery is 10x faster than inner join

Answers (1)

Related Questions