Reputation: 41
I'm optimizing my query using SQLite3. There are some "CASE WHEN", "GROUP BY", "COUNT" functions.
BUT the query is VERY Slow (about 14sec)
Here is my database file information.
- size: about 2GB
- rows : about 3 millions
- columns : 55 columns
What can i do for optimizing the query's performance?
Is there any better query for the result?
Please help me TT Thanks.
select
case
when score = 100 then 'A'
when score < 100 and score >= 40 then 'B'
else 'C'
end as range,
count(*) as count
from grade_info
where type < 9 and
(date >= '2019-07-09 00:00:00' and date <= '2019-07-09 23:59:59') and
is_new = 1
group by
case
when score = 100 then 'A'
when score < 100 and score >= 40 then 'B'
else 'C'
end;
Table grade_info
has multi-column index: (type, date, is_new, score)
The conditions for columns (type, date, is_new) are always used in this query. Here is the explain query plan result.
selectid | order | from | detail
--------------------------------
0 0 0 SEARCH TABLE grade_info USING INDEX idx_03 (type<?) (~2777 rows)
0 0 0 USE TEMP B-TREE FOR GROUP BY
and I want the result like this.
A | 5124
B | 124
C | 12354
Upvotes: 1
Views: 954
Reputation: 3886
As Shawn suggests, try changing the index to have the date
column as the first column:
CREATE INDEX [idx_cover] ON [grade_info] ([date], [is_new], [type], [score]);
sqlite allows reference to aliased expressions in the WHERE and GROUP BY clauses, so you can simply say GROUP BY range
rather than repeating the CASE statement. This probably won't change the efficiency, but makes the query shorter and more readable.
If you run ANALYZE
as MikeT suggests, the execution plan should change to say "COVERING INDEX...". If I understand correctly, that indicates that the entire query can be executed by traversing the single multi-column index without going back to the table data.
Try date BETWEEN '2019-07-09 00:00:00' AND '2019-07-09 23:59:59'
.
Finally, CASE... WHEN is short-circuited, so make sure to place the more likely cases first so that it avoids unnecessary calculations. Also eliminate redundant conditional checks. If you have already checked a certain range in a previous condition, no need to re-evaluate that range in the next condition. (If you have already ruled out score = 100, then it is unnecessary to check score < 100 since it will of course be less than 100... assuming that all scores are ensured to be in the range 0 to 100) For instance, if scores are uniformly distributed, then the following could be faster, possibly eliminating +17000 conditional checks.
SELECT
CASE
WHEN score < 40 then 'C'
WHEN score < 100 then 'B' -- already tested to be >= 40
ELSE 'A' -- already tested to be >= 100
END AS range,
count(*) AS count
FROM grade_info
WHERE type < 9 AND
(date BETWEEN '2019-07-09 00:00:00' AND '2019-07-09 23:59:59') AND
is_new = 1
GROUP BY
range;
Upvotes: 1