Reputation: 2349
My table (projects):
id, lft, rgt
1, 1, 6
2, 2, 3
3, 4, 5
4, 7, 10
5, 8, 9
6, 11, 12
7, 13, 14
As you may have noticed, this is hierarchical data using the nested set model. Tree pretty-printed:
1
2
3
4
5
6
7
I want to select all sub projects under project 1 and 4. I can do this with:
SELECT p.id
FROM projects AS p, projects AS ps
WHERE (ps.id = 1 OR ps.id = 4)
AND p.lft BETWEEN ps.lft AND ps.rgt
However, this is very slow with a large table, when running EXPLAIN (Query) i get:
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
| 1 | SIMPLE | ps | range | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4 | NULL | 2 | Using where |
| 1 | SIMPLE | p | ALL | lft,lftRgt | NULL | NULL | NULL | 7040 | Range checked for each record (index map: 0x12) |
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
(The project table has indexes on lft, rgt, and lft-rgt. As you can see, mysql does not use any index, and loops through the 7040 records)
I have found that if I only select for one of the super project, mysql manages to use the indexes:
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt
EXPLAINs to:
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | ps | const | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | p | range | lft,lftRgt | lft | 4 | NULL | 7 | Using where |
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
FINALLY, my question: I there any way i can SELECT rows matching multiple ranges, and still benefit from indexes?
Upvotes: 2
Views: 2365
Reputation: 425341
Your query does merge the multiple ranges.
It uses a range
access method to combine the multiple ranges on p
(which is leading in the join).
For each row returned from p
, it checks the best method to retrieve all rows from ps
for the given values of p.lft
and p.rgt
. Depending on the query selectivity, it may be either a fullscan over ps
or a index lookup over one of two possible indexes.
The number of rows shown in the EXPLAIN
means nothing: the EXPLAIN
just shows the worst possible outcome. It doesn't necessarily mean that all these rows will be examined. Whether they will or not the optimizer can only tell in runtime.
The documentation snippet about the impossibility to merge the multiple ranges is only valid for SPATIAL
indexes (R-Tree
those that you create over GEOMETRY
types). These indexes are good for the queries that search upwards (the ancestors of a given project) but not downwards.
A plain B-Tree
index can combine the multiple ranges. From the documentation:
For all types of indexes, multiple range conditions combined with
OR
orAND
form a range condition.
The real problem is that the optimizer in MySQL
cannot make a single correct decision: either use a single fullscan (with ps
leading), or make several range scans.
Say, you have 10,000
rows and your projects boundaries are 0-500
and 2000-2500
. The optimizer will see that each boundary will benefit from the index, the range check
will result in two range accesses, while a single fullscan would be better.
It may be even worse if your project boundaries are, say, 0-3000
and 5000-6000
. In this case the optimizer will make two fullscans, while one would suffice.
To help the optimizer make the correct decision, you should make the covering index on (lft, id)
in this order:
CREATE INDEX ix_lft_id ON projects (lft, id)
The tipping point for using the fullscan
over a covering index rather than a range condition is 90%
, that means you will never have more than a one fullscan in your actual plan.
Upvotes: 1
Reputation: 5557
From 7.2.5.1. The Range Access Method for Single-Part Indexes in MySQL reference manual:
Currently, MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.
So you need to have a union of two different selects.
Upvotes: 2
Reputation: 46882
have you tried a union? take your second example, add "union" underneath and the repeat but matching id 4. i don't know if it would work, but it seems like an obvious thing to try.
edit:
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt
UNION
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 4
AND p.lft BETWEEN ps.lft AND ps.rgt
Upvotes: 1