Reputation: 62
I have a table shown below
CREATE TABLE `xcpRush2_SandraTriplets` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idConceptStart` int(11) NOT NULL,
`idConceptLink` int(11) NOT NULL,
`idConceptTarget` int(11) NOT NULL,
`flag` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_name` (`idConceptStart`,`idConceptLink`,`idConceptTarget`),
KEY `idConceptStart` (`idConceptStart`,`idConceptLink`,`idConceptTarget`),
KEY `idConceptStart_4` (`idConceptStart`),
KEY `idConceptTarget` (`idConceptTarget`),
KEY `idConceptLink` (`idConceptLink`,`idConceptTarget`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Data will look like in db fiddle : https://www.db-fiddle.com/f/ejXP7qgvwNqAZeuaN3DFNz/3
As you can see it is fully indexed on on several column.
On my table I have about 800k idConceptStart satisfying the condition of having
idConceptLink = 5 idConceptTarget = 14500 AND
idConceptLink = 3 idConceptLink = 14504 AND
idConceptLink = 12 idConceptLink = 11
When I execute this query
SELECT * FROM xcpRush2_SandraTriplets l
JOIN xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart
JOIN xcpRush2_SandraTriplets link2 ON link2.idConceptStart = link1.idConceptStart
WHERE
l.idConceptLink = 5 AND
l.idConceptTarget = 14500 AND
l.flag != 1 AND
link2.flag != 1 AND
link2.idConceptLink = 3 AND
link2.idConceptTarget = 14504 AND
link1.flag != 1 AND
link1.idConceptTarget = 12 AND
l.idConceptLink = 11
ORDER BY l.idConceptStart DESC LIMIT 10
The query takes about 30 seconds (!) to render
But if I remove this (and only this)
link2.idConceptLink = 3 AND link2.idConceptTarget =14504
then the query takes 20 milliseconds to render
SELECT * FROM xcpRush2_SandraTriplets l
JOIN xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart
JOIN xcpRush2_SandraTriplets link2 ON link2.idConceptStart = l.idConceptStart
WHERE
l.idConceptLink = 5 AND
l.idConceptTarget = 14500 AND
l.flag != 1 AND
link2.flag != 1 AND
link1.flag != 1 AND
link1.idConceptTarget = 12 AND
link1.idConceptLink = 11
ORDER BY l.idConceptStart DESC LIMIT 10
I'm puzzled because the table is indexed on idConceptLink
,idConceptTarget
and each of those query taken separately are very fast to render < 20 ms
Also every idConceptLink
,idConceptTarget
pairs in the query are all returning a heavy number of rows (not only link2.idConceptLink = 3 AND link2.idConceptTarget =14504)
Could you help me identifying the bottle neck ?
Edit
After more findings in the comments the issue seems to be on the ORDER BY. Depending if I join on l.idConceptStart or link1.idConceptStart or link2.idConceptStart the query is slow. On my actual case ORDER BY link2.idConceptStart is slow.
The index structure is the following
CREATE TABLE `xcpRush2_SandraTriplets` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idConceptStart` int(11) NOT NULL,
`idConceptLink` int(11) NOT NULL,
`idConceptTarget` int(11) NOT NULL,
`flag` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_name` (`idConceptStart`,`idConceptLink`,`idConceptTarget`),
KEY `idConceptStart` (`idConceptStart`),
KEY `idConceptTarget` (`idConceptTarget`),
KEY `idConceptLink` (`idConceptLink`,`idConceptTarget`)
) ENGINE=InnoDB AUTO_INCREMENT=5747878 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Index are
xcprush2_sandratriplets 0 PRIMARY 1 id A 5207892 NULL NULL BTREE
xcprush2_sandratriplets 0 idx_name 1 idConceptStart A 1243366 NULL NULL BTREE
xcprush2_sandratriplets 0 idx_name 2 idConceptLink A 5207936 NULL NULL BTREE
xcprush2_sandratriplets 0 idx_name 3 idConceptTarget A 5207936 NULL NULL BTREE
xcprush2_sandratriplets 1 idConceptStart 1 idConceptStart A 1122352 NULL NULL BTREE
xcprush2_sandratriplets 1 idConceptTarget 1 idConceptTarget A 123870 NULL NULL BTREE
xcprush2_sandratriplets 1 idConceptLink 1 idConceptLink A 5 NULL NULL BTREE
xcprush2_sandratriplets 1 idConceptLink 2 idConceptTarget A 154480 NULL NULL BTREE
The query is slow when I do
SELECT l.idConceptStart, l.idConceptLink, l.`idConceptTarget` FROM xcpRush2_SandraTriplets l JOIN xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart JOIN xcpRush2_SandraTriplets link2 ON link2.idConceptStart = l.idConceptStart
WHERE l.idConceptLink = 5
AND l.idConceptTarget = 14500
AND l.flag != 1
AND link1.flag != 1 AND
link1.idConceptTarget =14504 AND link1.idConceptLink = 3 AND link2.flag != 1 AND
link2.idConceptTarget =12 AND link2.idConceptLink = 11 ORDER BY link2.idConceptStart DESC LIMIT 1000 OFFSET 0
Here is the EXPLAIN structure
1 SIMPLE link1 NULL ref idx_name,idConceptStart,idConceptTarget,idConceptLink idConceptTarget 4 const 1611256 18.00 Using where; Using temporary; Using filesort
1 SIMPLE l NULL eq_ref idx_name,idConceptStart,idConceptTarget,idConceptLink idx_name 12 sandra.link1.idConceptStart,const,const 1 90.00 Using where
1 SIMPLE link2 NULL eq_ref idx_name,idConceptStart,idConceptTarget,idConceptLink idx_name 12 sandra.link1.idConceptStart,const,const 1 90.00 Using where
The query is fast when I do
SELECT l.idConceptStart, l.idConceptLink, l.`idConceptTarget` FROM xcpRush2_SandraTriplets l JOIN xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart JOIN xcpRush2_SandraTriplets link2 ON link2.idConceptStart = l.idConceptStart
WHERE l.idConceptLink = 5
AND l.idConceptTarget = 14500
AND l.flag != 1
AND link1.flag != 1 AND
link1.idConceptTarget =14504 AND link1.idConceptLink = 3 AND link2.flag != 1 AND
link2.idConceptTarget =12 AND link2.idConceptLink = 11 ORDER BY l.idConceptStart DESC LIMIT 1000 OFFSET 0
Here is the EXPLAIN structure
1 SIMPLE l NULL index idx_name,idConceptStart,idConceptTarget,idConceptLink idConceptStart 4 NULL 13036 3.08 Using where
1 SIMPLE link1 NULL eq_ref idx_name,idConceptStart,idConceptTarget,idConceptLink idx_name 12 sandra.l.idConceptStart,const,const 1 90.00 Using where
1 SIMPLE link2 NULL eq_ref idx_name,idConceptStart,idConceptTarget,idConceptLink idx_name 12 sandra.l.idConceptStart,const,const 1 90.00 Using where
Edit 2
the optimal table to sort seems to be random. Now that I run the same query few hours laters (some insert occurred) but using the same query the structure of solving the key order changed. The fast query becomes the slow one and the slow one become the fast one. If I ORDER BY l.idConceptStart the following explain
the table resolution order for keys seems to be random. I'm completely lost. In the end the only think I need is to get last database entry first
Upvotes: 1
Views: 120
Reputation: 142560
"fully indexed" -- No. You have a few indexes, including some redundant ones.
This may be the optimal index for your query:
INDEX(link, target, start)
Let's talk about flag
. How many different values does it have? If only 2 (say, 0 an 1), then change to flag = 0
instead of flag != 1
. The Optimizer is better at working with =
tests than !=
. And change to INDEX(link, target, flag, start)
.
What percentage of rows have flag=1? This could lead to some more thoughts.
You have UNIQUE
key, plus a surrogate id
? Do you reference id
from any other table? If not, get rid of it, and promote the UNIQUE
to PRIMARY KEY
. But at that point, I would want the columns in that PK to be rearranged to match my suggestion.
Some rules for building indexes:
=
first (link
and target
, in either order)WHERE
(!=
is stopping that) if you want it to also include the ORDER BY
column(s). This is especially true if there is also a LIMIT
.UNIQUE(a,b,c)
precludes the need for INDEX(a,b,c)
INDEX(a,b)
precludes the need for INDEX(a)
.Upvotes: 1
Reputation: 2343
By using ORDER BY l.idConceptStart DESC rather than ORDER BY link2.idConceptStart DESC you were able to avoid per EXPLAIN the temp storage and filesort and the ROWS to be accessed were significantly reduced.
View my profile, Network profile for contact info.
Upvotes: 0