Philipp
Philipp

Reputation: 11351

Optimize SQL query with many inner joins on same table

I'm stuck with a performance issue:

A shop has an article filter with categories "color", "size", "gender" and "feature". All those details are stored inside an article_criterias table, that looks like this:

Table layout of article_criterias is; this table has about 36.000 rows:

article_id | group    | option | option_val
       100 | "size"   | "35"   |     35.00
       100 | "size"   | "36"   |     36.00
       100 | "size"   | "36½"  |     36.50
       100 | "color"  | "40"   |     40.00
       100 | "color"  | "50"   |     50.00
       100 | "gender" | "1"    |      1.00
       101 | "size"   | "40"   |     40.00
       ...

We have a SQL query that is built dynamically, based on which criteria are currently selected. The query is good for 2-3 criteria, but will get very slow when selecting more than 5 options (each additional INNER JOIN roughly doubles the execution time)

How can we make this SQL faster, maybe even replacing the inner joins with a more performant concept?

This is the query (the logic is correct, just the performance is bad):

-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
    criteria.group AS `key`,
    criteria.option AS `value`
FROM articles
    INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
    INNER JOIN article_criterias AS criteria_gender 
        ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
    INNER JOIN article_criterias AS criteria_color1 
        ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
    INNER JOIN article_criterias AS criteria_size2 
        ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
    INNER JOIN article_criterias AS criteria_size3 
        ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
    INNER JOIN article_criterias AS criteria_size4 
        ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
    INNER JOIN article_criterias AS criteria_size5 
        ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
    INNER JOIN article_criterias AS criteria_size6 
        ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
    INNER JOIN article_criterias AS criteria_size7 
        ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
    AND (criteria_gender.option IN ("1"))
    AND (criteria_color1.option IN ("80", "30"))
    AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
    AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
    AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
    AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
    AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
    AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)

Upvotes: 0

Views: 1738

Answers (3)

Thorsten Kettner
Thorsten Kettner

Reputation: 94914

Key/value tables are really a nuisance. However, in order to find certain criteria matches aggregate your data:

select 
  a.*,
  ac.group AS "key",
  ac.option AS "value"
from articles a
join article_criterias ac on ac.article_id = a.article_id
where a.article_id in
(
  select article_id
  from article_criterias
  group by article_id
  having sum("group" = 'gender' and option = '1') > 0
     and sum("group" = 'color' and option in ('30','80')) > 0
     and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
     and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
     and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
     and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
     and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
     and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
)
order by a.article_id, ac.group, ac.option;

This gets you all articles that are available for gender 1, colors 30 and/or 80, and all listed size ranges, along with all their options. (The size ranges are a bit strange, though; a size 36.5 would meet two ranges for instance.) You get the idea: group by article_id and use HAVING in order to only get article_ids that meet the critria.

As to indexes you'll want

create index idx on article_criterias(article_id, "group", option, option_val);

Upvotes: 2

etsa
etsa

Reputation: 5060

I understand indexs you create solved your problem, but just to play with a pseudo alternative (which avoid multiple INNER JOIN), can you try something like this? (I did test with just three condition. Your condition should be inserted in inner query. To select only the record who meets all conditions, you have to change last WHERE condition (WHERE max = 3, using the number of conditions you wrote above; so if you are using 5 conditions, you should write WHERE max = 5). (I changed the name of columns groups and option, for my ease of use). It's just an idea so pls do some tests and check for performance and pls let me know...

CREATE TABLE CRITERIA (ARTICLE_ID INT, GROU VARCHAR(10), OPT VARCHAR(20), OPTION_VAL NUMERIC(12,2));
CREATE TABLE ARTICLES (ID INT);
INSERT INTO CRITERIA VALUES (100,'size','35',35);
INSERT INTO CRITERIA VALUES (100,'size','36',36);
INSERT INTO CRITERIA VALUES (100,'color','40',40);
INSERT INTO CRITERIA VALUES (100,'gender','1',1);
INSERT INTO CRITERIA VALUES (200,'size','36.2',36.2);
INSERT INTO CRITERIA VALUES (300,'size','36.2',36.2);
INSERT INTO ARTICLES VALUES (100);
INSERT INTO ARTICLES VALUES (200);
INSERT INTO ARTICLES VALUES (300);

-------------------------------------------------------

SELECT D.article_id, D.GROU, D.OPT
FROM (SELECT C.*
     , @o:=CASE WHEN @h=ARTICLE_ID THEN @o ELSE cumul END max
     , @h:=ARTICLE_ID AS a_id
     FROM (SELECT article_id,
             B.GROU, B.OPT,             
             @r:= CASE WHEN @g = B.ARTICLE_ID THEN @r+1 ELSE 1 END cumul,                        
             @g:= B.ARTICLE_ID g                
             FROM CRITERIA B
             CROSS JOIN (SELECT @g:=0, @r:=0) T1
             WHERE (B.GROU='gender' AND B.OPT IN ('1'))
                    OR  (B.GROU='color'  AND B.OPT IN ('40', '30'))
                    OR  (B.GROU='size'   AND B.OPT BETWEEN 35.500000 AND 36.500000)
             ORDER BY article_id
    ) C
CROSS JOIN (SELECT @o:=0, @h:=0) T2
ORDER BY ARTICLE_ID, CUMUL DESC) D
WHERE max=3
;

Output:

article_id  GROU    OPT
100 gender  1
100 color   40
100 size    36

Upvotes: 0

Philipp
Philipp

Reputation: 11351

As suggested by @affan-pathan adding index did solve the issue:

CREATE INDEX text_option 
ON `article_criterias` (`article_id`, `group`, `option`);

CREATE INDEX numeric_option 
ON `article_criterias` (`article_id`, `group`, `option_val`);

Those two indexes cut the execute time of the above query form nearly 1 minute to less than 50 milliseconds!!

Upvotes: 0

Related Questions