How do I choose the best SQL query if there are different ways of accomplishing the same task?

Question

I'm learning SQL (using SQLite 3 and its sqlite3 command-line tool) and I've noticed that I can do some things in several ways, and sometimes it is not clear which one is better. Here are three queries which do the same thing, one executed through intersect, another through inner join and distinct, the last one similar to the second one but it incorporates filtering through where. (The first one was written by the author of the book I'm reading, and the others I wrote myself.)

The question is, which of these queries is better and why? And, more generally, how can I know when one query is better than another? Are there some guidelines I missed or perhaps I should learn SQLite internals despite the declarative nature of SQL?

(In the following example, there are tables that describe food names that are mentioned in some TV series. Foods_episodes is many-to-many linking table while others describe food names and episode names together with season number. Note that all-time ten top foods (based on the count of their appearances in all series) are being looked for, not just top foods in seasons 3..5)

-- task
--     find the all-time top ten foods that appear in seasons 3 through 5
-- schema
--     CREATE TABLE episodes (
--       id integer primary key,
--       season int,
--       name text );
--     CREATE TABLE foods(
--       id integer primary key,
--       name text );
--     CREATE TABLE foods_episodes(
--       food_id integer,
--       episode_id integer );



select f.* from foods f 
inner join 
    (select food_id, count(food_id) as count 
        from foods_episodes 
        group by food_id 
        order by count(food_id) desc limit 10) top_foods 
    on f.id=top_foods.food_id 
intersect 
select f.* from foods f 
    inner join foods_episodes fe on f.id = fe.food_id 
    inner join episodes e on fe.episode_id = e.id 
where
    e.season between 3 and 5 
order by
    f.name; 



select
    distinct f.*
from
    foods_episodes as fe
    inner join episodes as e on e.id = fe.episode_id
    inner join foods as f on fe.food_id = f.id
    inner join (select food_id from foods_episodes
        group by food_id order by count(*) desc limit 10) as lol
        on lol.food_id = fe.food_id
where
    e.season between 3 and 5
order by 
    f.name;



select
    distinct f.*
from
    foods_episodes as fe
    inner join episodes as e on e.id = fe.episode_id
    inner join foods as f on fe.food_id = f.id
where
    fe.food_id in (select food_id from foods_episodes
        group by food_id order by count(*) desc limit 10)
    and e.season between 3 and 5
order by 
    f.name;

-- output (same for these thee):

-- id          name      
-- ----------  ----------
-- 4           Bear Claws
-- 146         Decaf Capp
-- 153         Hennigen's
-- 55          Kasha     
-- 94          Ketchup   
-- 164         Naya Water
-- 317         Pizza     
-- CPU Time: user 0.000000 sys 0.000000

Girish Rao · Accepted Answer

Similar to MySQL, it looks like SQLlite has an EXPLAIN command. Prepend your select with the EXPLAIN keyword and it will return information about the query, including the number of rows scanned, and the indexes used.

http://www.sqlite.org/lang_explain.html

By running EXPLAIN on various selects you can determine which queries (and sub-queries) are more efficient than others.

And here is a general overview of SQLlite's query planner and optimization: http://sqlite.org/optoverview.html

SQLlite3 also supports a callback function to trace queries. You have to implement it though: http://www.sqlite.org/c3ref/profile.html

How do I choose the best SQL query if there are different ways of accomplishing the same task?

Answers (2)

Related Questions