Reputation: 11351
I'm stuck with a performance issue:
A shop has an article filter with categories "color", "size", "gender" and "feature". All those details are stored inside an article_criterias
table, that looks like this:
Table layout of article_criterias
is; this table has about 36.000 rows:
article_id | group | option | option_val
100 | "size" | "35" | 35.00
100 | "size" | "36" | 36.00
100 | "size" | "36½" | 36.50
100 | "color" | "40" | 40.00
100 | "color" | "50" | 50.00
100 | "gender" | "1" | 1.00
101 | "size" | "40" | 40.00
...
We have a SQL query that is built dynamically, based on which criteria are currently selected. The query is good for 2-3 criteria, but will get very slow when selecting more than 5 options (each additional INNER JOIN roughly doubles the execution time)
How can we make this SQL faster, maybe even replacing the inner joins with a more performant concept?
This is the query (the logic is correct, just the performance is bad):
-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
criteria.group AS `key`,
criteria.option AS `value`
FROM articles
INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
INNER JOIN article_criterias AS criteria_gender
ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
INNER JOIN article_criterias AS criteria_color1
ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
INNER JOIN article_criterias AS criteria_size2
ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
INNER JOIN article_criterias AS criteria_size3
ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
INNER JOIN article_criterias AS criteria_size4
ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
INNER JOIN article_criterias AS criteria_size5
ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
INNER JOIN article_criterias AS criteria_size6
ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
INNER JOIN article_criterias AS criteria_size7
ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
AND (criteria_gender.option IN ("1"))
AND (criteria_color1.option IN ("80", "30"))
AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)
Upvotes: 0
Views: 1738
Reputation: 94914
Key/value tables are really a nuisance. However, in order to find certain criteria matches aggregate your data:
select
a.*,
ac.group AS "key",
ac.option AS "value"
from articles a
join article_criterias ac on ac.article_id = a.article_id
where a.article_id in
(
select article_id
from article_criterias
group by article_id
having sum("group" = 'gender' and option = '1') > 0
and sum("group" = 'color' and option in ('30','80')) > 0
and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
)
order by a.article_id, ac.group, ac.option;
This gets you all articles that are available for gender 1, colors 30 and/or 80, and all listed size ranges, along with all their options. (The size ranges are a bit strange, though; a size 36.5 would meet two ranges for instance.) You get the idea: group by article_id and use HAVING
in order to only get article_ids that meet the critria.
As to indexes you'll want
create index idx on article_criterias(article_id, "group", option, option_val);
Upvotes: 2
Reputation: 5060
I understand indexs you create solved your problem, but just to play with a pseudo alternative (which avoid multiple INNER JOIN), can you try something like this? (I did test with just three condition. Your condition should be inserted in inner query. To select only the record who meets all conditions, you have to change last WHERE condition (WHERE max = 3, using the number of conditions you wrote above; so if you are using 5 conditions, you should write WHERE max = 5). (I changed the name of columns groups and option, for my ease of use). It's just an idea so pls do some tests and check for performance and pls let me know...
CREATE TABLE CRITERIA (ARTICLE_ID INT, GROU VARCHAR(10), OPT VARCHAR(20), OPTION_VAL NUMERIC(12,2));
CREATE TABLE ARTICLES (ID INT);
INSERT INTO CRITERIA VALUES (100,'size','35',35);
INSERT INTO CRITERIA VALUES (100,'size','36',36);
INSERT INTO CRITERIA VALUES (100,'color','40',40);
INSERT INTO CRITERIA VALUES (100,'gender','1',1);
INSERT INTO CRITERIA VALUES (200,'size','36.2',36.2);
INSERT INTO CRITERIA VALUES (300,'size','36.2',36.2);
INSERT INTO ARTICLES VALUES (100);
INSERT INTO ARTICLES VALUES (200);
INSERT INTO ARTICLES VALUES (300);
-------------------------------------------------------
SELECT D.article_id, D.GROU, D.OPT
FROM (SELECT C.*
, @o:=CASE WHEN @h=ARTICLE_ID THEN @o ELSE cumul END max
, @h:=ARTICLE_ID AS a_id
FROM (SELECT article_id,
B.GROU, B.OPT,
@r:= CASE WHEN @g = B.ARTICLE_ID THEN @r+1 ELSE 1 END cumul,
@g:= B.ARTICLE_ID g
FROM CRITERIA B
CROSS JOIN (SELECT @g:=0, @r:=0) T1
WHERE (B.GROU='gender' AND B.OPT IN ('1'))
OR (B.GROU='color' AND B.OPT IN ('40', '30'))
OR (B.GROU='size' AND B.OPT BETWEEN 35.500000 AND 36.500000)
ORDER BY article_id
) C
CROSS JOIN (SELECT @o:=0, @h:=0) T2
ORDER BY ARTICLE_ID, CUMUL DESC) D
WHERE max=3
;
Output:
article_id GROU OPT
100 gender 1
100 color 40
100 size 36
Upvotes: 0
Reputation: 11351
As suggested by @affan-pathan adding index did solve the issue:
CREATE INDEX text_option
ON `article_criterias` (`article_id`, `group`, `option`);
CREATE INDEX numeric_option
ON `article_criterias` (`article_id`, `group`, `option_val`);
Those two indexes cut the execute time of the above query form nearly 1 minute to less than 50 milliseconds!!
Upvotes: 0