Reputation: 15248
Update: added background info and more explanation
Given ~300K records in the following table
CREATE TABLE t (
id INTEGER PRIMARY KEY,
a TEXT,
b TEXT,
c TEXT,
deleted INTEGER DEFAULT 0
);
CREATE INDEX ix_t ON t (deleted) WHERE deleted = 0;
1) SELECT Count(id) FROM t WHERE deleted = 0;
2) SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
3) SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
4) SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
Query 1 takes 4-5 ms. The other three queries COUNT (DISTINCT <col>)
take 600-900ms. How can I speed up these queries to the same order as the first one? I created following indexes but it didn't help
CREATE INDEX ix_t_a ON t (a, deleted) WHERE deleted = 0;
-- (and so on the columns b and c as well)
EXPLAIN QUERY PLAN SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- output: SEARCH TABLE t USING INDEX ix_t (deleted=?)
As we can see above, the new index is not being used.
The deleted
column is a flag to keep track of rows that are to be excluded from all queries. For the most part, such rows will be very few, but it is essential that they are not used in the counts and selects.
Eventually the number of rows might grow to 3 times as much, let's say 1M. Even then, the number of deleted
rows will be minimal.
The COUNT (DISTINCT column)
counts are to create facets. Let's say someone searches for all rows WHERE a = 'foo'
. I need to return the matching rows, and I also need to return how many DISTINCT
b
and c
are present in those rows. So, I will do something like
-- number of rows in the result set
SELECT COUNT(id) FROM t WHERE deleted = 0 AND a = 'foo';
-- the actual result set
SELECT * FROM t WHERE deleted = 0 AND a = 'foo' LIMIT 30 OFFSET 0;
-- facet counts
SELECT COUNT (DISTINCT b) FROM t WHERE deleted = 0 AND a = 'foo';
SELECT COUNT (DISTINCT c) FROM t WHERE deleted = 0 AND a = 'foo';
In the very first instance, since there is no WHERE
clause, the above queries will be
-- number of rows in the result set
SELECT COUNT(id) FROM t WHERE deleted = 0;
-- the actual result set
SELECT * FROM t WHERE deleted = 0 LIMIT 30 OFFSET 0;
-- facet counts
SELECT COUNT (DISTINCT a) FROM t WHERE deleted = 0;
SELECT COUNT (DISTINCT b) FROM t WHERE deleted = 0;
SELECT COUNT (DISTINCT c) FROM t WHERE deleted = 0;
The actual columns in the table that make up the facets are about 8 or 9. So, I have to do about 8 or 9 SELECT COUNT (DISTINCT col)
. With each taking about 600-900ms, that is almost 6-10 seconds. Way too slow for a query from the point-of-view of the user. Reducing the selects by half or ⅔ makes a huge difference.
In reality, I also have a query cache, so the results of any query performed once will be cached, and will be very fast the second time, as long as it is exactly the same query. Here, by query, I mean a query from the pov of the user. Of course, each query from the pov of the user results in 9-10 database queries. Nevertheless, speed is very important to create a performant application.
Upvotes: 0
Views: 808
Reputation: 56948
How can I speed up these queries to the same order as the first one?
I'm not sure if you can that much and if so it may well depend upon the data itself e.g. if analyze does some amazing things.
DISTINCT has quite some impact. So that may be the sole reason for the discrepancy
If you just use EXPLAIN for the first query without DISTINCT the result is:-
For the other queries the EXPLAIN result is :-
The additional (rather than different operations are highlighted i.e. there is more to be done)
You may wish to have a look at The SQLite Bytecode Engine - The Opcodes
How can I speed up these queries to the same order as the first one?
Perhaps consider adapting the following code that was used to have a little play and conclude that ANALYZE may not improve matters and nor does playing around (a little bit) with the indexes appear to make much difference :-
DROP TABLE IF EXISTS t;
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0);
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1;
SELECT * FROM t; /* COMMENT/UNCOMMENT WITH/WITHOUT -- To HIDE/SHOW TABLE */
CREATE INDEX ix_t ON t (deleted) WHERE deleted = 0;
-- EXPLAIN /* UNCOMMENT BY REMOVING -- ON THIS LINE TO JUST DO EXPLAIN */
-- QUERY PLAN /* TO DO EXPLAIN QUERY PLAN UNCOMMNET LINE ABOVE AS ABOVE and UNCOMMENT THIS SAME WAY */
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
ANALYZE;
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
/*<<<<<<<<<< ATTEMPT 2 >>>>>>>>>>*/
DROP TABLE IF EXISTS t;
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0);
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1;
-- SELECT * FROM t;
CREATE INDEX ix_t ON t (deleted,a,b,c) WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
ANALYZE;
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
This when run (assuming the tool used (ok in Navicat) allows a series of queries). Will :-
Example log :-
DROP TABLE IF EXISTS t
> OK
> Time: 1.229s
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0)
> OK
> Time: 0.132s
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1
> Affected rows: 300000
> Time: 0.75s
-- SELECT * FROM t;
CREATE INDEX ix_t ON t (deleted) WHERE deleted = 0
> OK
> Time: 0.195s
EXPLAIN
-- QUERY PLAN
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0s
EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0s
EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0s
EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0s
ANALYZE
> OK
> Time: 0.137s
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0.031s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0.057s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0.054s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0.055s
/*<<<<<<<<<< ATTEMPT 2 >>>>>>>>>>*/
DROP TABLE IF EXISTS t
> OK
> Time: 0.891s
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0)
> OK
> Time: 0.153s
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1
> Affected rows: 300000
> Time: 0.643s
-- SELECT * FROM t;
CREATE INDEX ix_t ON t (deleted,a,b,c) WHERE deleted = 0
> OK
> Time: 0.583s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0.029s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0.041s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0.029s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0.031s
ANALYZE
> OK
> Time: 0.121s
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0.038s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0.046s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0.029s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0.031s
I haven't done much working out with timings but perhaps consider this (using a tempt table with only deleted = 0 rows) :-
DROP TABLE IF EXISTS t;
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0);
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1;
-- SELECT * FROM t; /* COMMENT/UNCOMMENT WITH/WITHOUT -- To HIDE/SHOW TABLE */
CREATE TEMP TABLE trimmedt AS SELECT * FROM t WHERE deleted = 0;
-- EXPLAIN /* UNCOMMENT BY REMOVING -- ON THIS LINE TO JUST DO EXPLAIN */
-- QUERY PLAN /* TO DO EXPLAIN QUERY PLAN UNCOMMNET LINE ABOVE AS ABOVE and UNCOMMENT THIS SAME WAY */
SELECT Count(id) FROM trimmedt;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM trimmedt;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM trimmedt;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM trimmedt;
Message Log :-
DROP TABLE IF EXISTS t
> OK
> Time: 1.242s
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0)
> OK
> Time: 0.103s
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1
> Affected rows: 300000
> Time: 0.722s
-- SELECT * FROM t; /* COMMENT/UNCOMMENT WITH/WITHOUT -- To HIDE/SHOW TABLE */
CREATE TEMP TABLE trimmedt AS SELECT * FROM t WHERE deleted = 0
> OK
> Time: 0.091s
-- EXPLAIN /* UNCOMMENT BY REMOVING -- ON THIS LINE TO JUST DO EXPLAIN */
-- QUERY PLAN /* TO DO EXPLAIN QUERY PLAN UNCOMMNET LINE ABOVE AS ABOVE and UNCOMMENT THIS SAME WAY */
SELECT Count(id) FROM trimmedt
> OK
> Time: 0.009s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM trimmedt
> OK
> Time: 0.03s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM trimmedt
> OK
> Time: 0.03s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM trimmedt
> OK
> Time: 0.031s
Upvotes: 2