MARY
MARY

Reputation: 41

Sqlite improve case-when and group by performance

I'm optimizing my query using SQLite3. There are some "CASE WHEN", "GROUP BY", "COUNT" functions.

BUT the query is VERY Slow (about 14sec)

Here is my database file information.

  1. size: about 2GB
  2. rows : about 3 millions
  3. columns : 55 columns

What can i do for optimizing the query's performance?

Is there any better query for the result?

Please help me TT Thanks.

select
  case
    when score = 100 then 'A'
    when score < 100 and score >= 40 then 'B'
    else 'C'
  end as range,
  count(*) as count
from grade_info
where type < 9 and 
   (date >= '2019-07-09 00:00:00' and date <= '2019-07-09 23:59:59') and 
   is_new = 1
group by
  case
    when score = 100 then 'A'
    when score < 100 and score >= 40 then 'B'
    else 'C'
  end;

Table grade_info has multi-column index: (type, date, is_new, score)

The conditions for columns (type, date, is_new) are always used in this query. Here is the explain query plan result.

 selectid | order | from | detail
 --------------------------------
 0          0       0      SEARCH TABLE grade_info USING INDEX idx_03 (type<?) (~2777 rows) 
 0          0       0      USE TEMP B-TREE FOR GROUP BY 

and I want the result like this.

A | 5124
B | 124
C | 12354

Upvotes: 1

Views: 954

Answers (1)

C Perkins
C Perkins

Reputation: 3886

  • As Shawn suggests, try changing the index to have the date column as the first column:

    CREATE INDEX [idx_cover] ON [grade_info] ([date], [is_new], [type], [score]);
    
  • sqlite allows reference to aliased expressions in the WHERE and GROUP BY clauses, so you can simply say GROUP BY range rather than repeating the CASE statement. This probably won't change the efficiency, but makes the query shorter and more readable.

  • If you run ANALYZE as MikeT suggests, the execution plan should change to say "COVERING INDEX...". If I understand correctly, that indicates that the entire query can be executed by traversing the single multi-column index without going back to the table data.

  • Try date BETWEEN '2019-07-09 00:00:00' AND '2019-07-09 23:59:59'.

  • Finally, CASE... WHEN is short-circuited, so make sure to place the more likely cases first so that it avoids unnecessary calculations. Also eliminate redundant conditional checks. If you have already checked a certain range in a previous condition, no need to re-evaluate that range in the next condition. (If you have already ruled out score = 100, then it is unnecessary to check score < 100 since it will of course be less than 100... assuming that all scores are ensured to be in the range 0 to 100) For instance, if scores are uniformly distributed, then the following could be faster, possibly eliminating +17000 conditional checks.

SELECT
  CASE
    WHEN score < 40 then 'C'
    WHEN score < 100 then 'B'  -- already tested to be >= 40
    ELSE 'A'                   -- already tested to be >= 100
  END AS range,
  count(*) AS count
FROM grade_info
WHERE type < 9 AND
      (date BETWEEN '2019-07-09 00:00:00' AND '2019-07-09 23:59:59') AND
      is_new = 1       
GROUP BY
  range;

Upvotes: 1

Related Questions