Shaban Shaame
Shaban Shaame

Reputation: 62

SQL triple join on the same table is slow

I have a table shown below

 CREATE TABLE `xcpRush2_SandraTriplets` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
  `idConceptStart` int(11) NOT NULL,
  `idConceptLink` int(11) NOT NULL,
  `idConceptTarget` int(11) NOT NULL,
  `flag` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_name` (`idConceptStart`,`idConceptLink`,`idConceptTarget`),
  KEY `idConceptStart` (`idConceptStart`,`idConceptLink`,`idConceptTarget`),
  KEY `idConceptStart_4` (`idConceptStart`),
  KEY `idConceptTarget` (`idConceptTarget`),
  KEY `idConceptLink` (`idConceptLink`,`idConceptTarget`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Data will look like in db fiddle : https://www.db-fiddle.com/f/ejXP7qgvwNqAZeuaN3DFNz/3

As you can see it is fully indexed on on several column.

On my table I have about 800k idConceptStart satisfying the condition of having

idConceptLink = 5 idConceptTarget = 14500 AND
idConceptLink = 3 idConceptLink = 14504 AND
idConceptLink = 12 idConceptLink = 11

When I execute this query

SELECT * FROM  xcpRush2_SandraTriplets l    
   JOIN  xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart  
   JOIN  xcpRush2_SandraTriplets link2 ON link2.idConceptStart = link1.idConceptStart

    WHERE 
       l.idConceptLink = 5  AND 
       l.idConceptTarget = 14500 AND 
       l.flag != 1 AND 

       link2.flag != 1 AND 
       link2.idConceptLink = 3 AND 
       link2.idConceptTarget = 14504 AND 

       link1.flag != 1 AND 
       link1.idConceptTarget = 12 AND 
       l.idConceptLink = 11  

    ORDER BY l.idConceptStart DESC  LIMIT 10 

Here is the SQL Explain SQL Explain

The query takes about 30 seconds (!) to render

But if I remove this (and only this)

 link2.idConceptLink = 3 AND link2.idConceptTarget =14504

then the query takes 20 milliseconds to render

    SELECT * FROM  xcpRush2_SandraTriplets l    
   JOIN  xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart  
   JOIN  xcpRush2_SandraTriplets link2 ON link2.idConceptStart = l.idConceptStart 
   WHERE 
      l.idConceptLink = 5 AND 
      l.idConceptTarget = 14500 AND 
      l.flag != 1  AND 

      link2.flag != 1 AND       

      link1.flag != 1 AND 
      link1.idConceptTarget = 12 AND 
      link1.idConceptLink = 11  

    ORDER BY l.idConceptStart DESC  LIMIT 10 

SQL Explain

I'm puzzled because the table is indexed on idConceptLink,idConceptTarget and each of those query taken separately are very fast to render < 20 ms

Also every idConceptLink,idConceptTarget pairs in the query are all returning a heavy number of rows (not only link2.idConceptLink = 3 AND link2.idConceptTarget =14504)

Could you help me identifying the bottle neck ?

Edit

After more findings in the comments the issue seems to be on the ORDER BY. Depending if I join on l.idConceptStart or link1.idConceptStart or link2.idConceptStart the query is slow. On my actual case ORDER BY link2.idConceptStart is slow.

The index structure is the following

CREATE TABLE `xcpRush2_SandraTriplets` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `idConceptStart` int(11) NOT NULL,
  `idConceptLink` int(11) NOT NULL,
  `idConceptTarget` int(11) NOT NULL,
  `flag` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_name` (`idConceptStart`,`idConceptLink`,`idConceptTarget`),
  KEY `idConceptStart` (`idConceptStart`),
  KEY `idConceptTarget` (`idConceptTarget`),
  KEY `idConceptLink` (`idConceptLink`,`idConceptTarget`)
) ENGINE=InnoDB AUTO_INCREMENT=5747878 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Index are

xcprush2_sandratriplets 0   PRIMARY 1   id  A   5207892 NULL    NULL        BTREE       
xcprush2_sandratriplets 0   idx_name    1   idConceptStart  A   1243366 NULL    NULL        BTREE       
xcprush2_sandratriplets 0   idx_name    2   idConceptLink   A   5207936 NULL    NULL        BTREE       
xcprush2_sandratriplets 0   idx_name    3   idConceptTarget A   5207936 NULL    NULL        BTREE       
xcprush2_sandratriplets 1   idConceptStart  1   idConceptStart  A   1122352 NULL    NULL        BTREE       
xcprush2_sandratriplets 1   idConceptTarget 1   idConceptTarget A   123870  NULL    NULL        BTREE       
xcprush2_sandratriplets 1   idConceptLink   1   idConceptLink   A   5   NULL    NULL        BTREE       
xcprush2_sandratriplets 1   idConceptLink   2   idConceptTarget A   154480  NULL    NULL        BTREE

Indexes

The query is slow when I do

 SELECT  l.idConceptStart, l.idConceptLink, l.`idConceptTarget` FROM  xcpRush2_SandraTriplets l  JOIN  xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart  JOIN  xcpRush2_SandraTriplets link2 ON link2.idConceptStart = l.idConceptStart 
    WHERE l.idConceptLink = 5  
    AND l.idConceptTarget = 14500
    AND l.flag != 1 
     AND link1.flag != 1 AND 
            link1.idConceptTarget =14504 AND link1.idConceptLink = 3 AND link2.flag != 1 AND 
            link2.idConceptTarget =12 AND link2.idConceptLink = 11  ORDER BY  link2.idConceptStart DESC  LIMIT 1000 OFFSET 0

Here is the EXPLAIN structure

1   SIMPLE  link1   NULL    ref idx_name,idConceptStart,idConceptTarget,idConceptLink   idConceptTarget 4   const   1611256 18.00   Using where; Using temporary; Using filesort
1   SIMPLE  l   NULL    eq_ref  idx_name,idConceptStart,idConceptTarget,idConceptLink   idx_name    12  sandra.link1.idConceptStart,const,const 1   90.00   Using where
1   SIMPLE  link2   NULL    eq_ref  idx_name,idConceptStart,idConceptTarget,idConceptLink   idx_name    12  sandra.link1.idConceptStart,const,const 1   90.00   Using where

Slow explain

The query is fast when I do

SELECT  l.idConceptStart, l.idConceptLink, l.`idConceptTarget` FROM  xcpRush2_SandraTriplets l  JOIN  xcpRush2_SandraTriplets link1 ON link1.idConceptStart = l.idConceptStart  JOIN  xcpRush2_SandraTriplets link2 ON link2.idConceptStart = l.idConceptStart 
    WHERE l.idConceptLink = 5  
    AND l.idConceptTarget = 14500
    AND l.flag != 1 
     AND link1.flag != 1 AND 
            link1.idConceptTarget =14504 AND link1.idConceptLink = 3 AND link2.flag != 1 AND 
            link2.idConceptTarget =12 AND link2.idConceptLink = 11  ORDER BY  l.idConceptStart DESC  LIMIT 1000 OFFSET 0

Here is the EXPLAIN structure

   1    SIMPLE  l   NULL    index   idx_name,idConceptStart,idConceptTarget,idConceptLink   idConceptStart  4   NULL    13036   3.08    Using where
1   SIMPLE  link1   NULL    eq_ref  idx_name,idConceptStart,idConceptTarget,idConceptLink   idx_name    12  sandra.l.idConceptStart,const,const 1   90.00   Using where
1   SIMPLE  link2   NULL    eq_ref  idx_name,idConceptStart,idConceptTarget,idConceptLink   idx_name    12  sandra.l.idConceptStart,const,const 1   90.00   Using where

fastExplain

Edit 2

the optimal table to sort seems to be random. Now that I run the same query few hours laters (some insert occurred) but using the same query the structure of solving the key order changed. The fast query becomes the slow one and the slow one become the fast one. If I ORDER BY l.idConceptStart the following explain

explain edit 2

the table resolution order for keys seems to be random. I'm completely lost. In the end the only think I need is to get last database entry first

Upvotes: 1

Views: 120

Answers (2)

Rick James
Rick James

Reputation: 142560

"fully indexed" -- No. You have a few indexes, including some redundant ones.

This may be the optimal index for your query:

INDEX(link, target, start)

Let's talk about flag. How many different values does it have? If only 2 (say, 0 an 1), then change to flag = 0 instead of flag != 1. The Optimizer is better at working with = tests than !=. And change to INDEX(link, target, flag, start).

What percentage of rows have flag=1? This could lead to some more thoughts.

You have UNIQUE key, plus a surrogate id? Do you reference id from any other table? If not, get rid of it, and promote the UNIQUE to PRIMARY KEY. But at that point, I would want the columns in that PK to be rearranged to match my suggestion.

Some rules for building indexes:

  • Put columns tested with = first (link and target, in either order)
  • It is better to have the index completely handle the WHERE (!= is stopping that) if you want it to also include the ORDER BY column(s). This is especially true if there is also a LIMIT.
  • UNIQUE(a,b,c) precludes the need for INDEX(a,b,c)
  • INDEX(a,b) precludes the need for INDEX(a).
  • More: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Upvotes: 1

Wilson Hauck
Wilson Hauck

Reputation: 2343

By using ORDER BY l.idConceptStart DESC rather than ORDER BY link2.idConceptStart DESC you were able to avoid per EXPLAIN the temp storage and filesort and the ROWS to be accessed were significantly reduced.

View my profile, Network profile for contact info.

Upvotes: 0

Related Questions