Reputation: 428
I need to get the distinct elements of ref and alt. I have a very efficient query until I add distinct and it rescans the base table? Since I have a temp table shouldn't it simply use that as a source of data?
sqlite> explain query plan
...> select t1.ref, t1.alt from (SELECT * from Sample_szes where str_id
= 'STR_832206') as t1;
selectid|order|from|detail
1|0|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx ( .
str_id=?) (~10 rows)
1|1|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx
(str_id=?) (~10 rows)
2|0|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx
(str_id=?) (~10 rows)
2|1|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx
(str_id=?) (~2 rows)
2|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?) (~2 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
Add distinct and it rescans the large base table.
sqlite> explain query plan
...> select distinct t1.ref, t1.alt from (SELECT * from Sample_szes
where str_id = 'STR_832206') as t1;
selectid|order|from|detail
2|0|0|SCAN TABLE vcfBase AS base (~1000000 rows)
2|1|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx
(str_id=?) (~10 rows)
3|0|0|SCAN TABLE vcfBase AS base (~1000000 rows)
3|1|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx (str_id=?) (~2 rows)
3|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?) (~2 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1 (~1400000 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT
Upvotes: 0
Views: 81
Reputation: 56938
You should create a composite index for the ref and alt columns. The index would then be used. Otherwise the temporary B-TREE (index) is created which requires an entire scan to sort the data for the index.
I believe the explanation is as per :-
If a SELECT query contains an ORDER BY, GROUP BY or DISTINCT clause, SQLite may need to use a temporary b-tree structure to sort the output rows. Or, it might use an index. Using an index is almost always much more efficient than performing a sort.
If a temporary b-tree is required, a record is added to the EXPLAIN QUERY PLAN output with the "detail" field set to a string value of the form "USE TEMP B-TREE FOR xxx", where xxx is one of "ORDER BY", "GROUP BY" or "DISTINCT". For example:
sqlite> EXPLAIN QUERY PLAN SELECT c, d FROM t2 ORDER BY c; QUERY PLAN |--SCAN TABLE t2 `--USE TEMP B-TREE FOR ORDER BY
In this case using the temporary b-tree can be avoided by creating an index on t2(c), as follows:
sqlite> CREATE INDEX i4 ON t2(c); sqlite> EXPLAIN QUERY PLAN SELECT c, d FROM t2 ORDER BY c; QUERY PLAN `--SCAN TABLE t2 USING INDEX i4
Upvotes: 1
Reputation: 428
I think I might have found the answer. on my mac I have the following version of sqlite
SQLite version 3.19.3 2017-06-27 16:48:08
sqlite> explain query plan
...> select distinct t1.ref, t1.alt from (SELECT * from Sample_szes where str_id = 'STR_832206') as t1;
2|0|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx (str_id=?)
2|1|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx (str_id=?)
3|0|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx (str_id=?)
3|1|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx (str_id=?)
3|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR DISTINCT
Upvotes: 0