Reputation: 43
I've database in which I'm storing japanese dictionary: words, readings, tags, types, meanings in other languages (english is the most important here, but there's also a few other) and so on.
Now, I want to create an interface using Datatables js plugin, so user could see table and use some filtering options (like, show only verbs, or find entries containing "dog"). I'm struggling, however, with query which can be pretty slow when using filtering... I already speed it up a lot, but it still not good.
This is my basic query:
select
v.id,
(
select group_concat(distinct vke.kanji_element separator '; ') from vocabulary_kanji_element as vke
where vke.vocabulary_id = v.id
) kanji_notation,
(
select group_concat(distinct vre.reading_element separator '; ') from vocabulary_reading_element as vre
where vre.vocabulary_id = v.id
) reading_notation,
(
select group_concat(distinct vsg.gloss separator '; ') from vocabulary_sense_gloss as vsg
join vocabulary_sense as vs on vsg.sense_id = vs.id
join language as l on vsg.language_id = l.id and l.language_code = 'eng'
where vs.vocabulary_id = v.id
) meanings,
(
select group_concat(distinct pos.name_code separator '; ') from vocabulary_sense as vs
join vocabulary_sense_has_pos as vshp on vshp.sense_id = vs.id
join part_of_speech as pos on pos.id = vshp.pos_id
where vs.vocabulary_id = v.id
) pos
from vocabulary as v
join vocabulary_sense as vs on vs.vocabulary_id = v.id
join vocabulary_sense_gloss as vsg on vsg.sense_id = vs.id
join vocabulary_kanji_element as vke on vke.vocabulary_id = v.id
join vocabulary_reading_element as vre on vre.vocabulary_id = v.id
join language as l on l.id = vsg.language_id and l.language_code = 'eng'
join vocabulary_sense_has_pos as vshp on vshp.sense_id = vs.id
join part_of_speech as pos on pos.id = vshp.pos_id
where
-- pos.name_code = 'n' and
(vsg.gloss like '%eat%' OR vke.kanji_element like '%eat%' OR vre.reading_element like '%eat%')
group by v.id
order by v.id desc
-- limit 3900, 25
Output is something like this:
|id | kanji_notation | reading_notation | meanings | pos |
---------------------------------------------------------------
|117312| お手; 御手 | おて | hand; arm |n; int|
Right now (working on my local machine), If there's no WHERE statement, but with limit, it works fast - about 0,140 sec. But when text filtering is on, execution time wents up to 6,5 sec, and often above. With filtering on part_of_speech first, its like 5,5 sec. 3 sec would be ok, but 6 is just way too long.
There's 1 155 897 records in table vocabulary_sense_gloss, so I think that's not a lot.
CREATE TABLE `vocabulary_sense_gloss` (
`id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`sense_id` MEDIUMINT(8) UNSIGNED NOT NULL,
`gloss` VARCHAR(255) NOT NULL,
`language_id` MEDIUMINT(8) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `vocabulary_sense_gloss_vocabulary_sense_id` (`sense_id`),
INDEX `vocabulary_sense_gloss_language_id` (`language_id`),
FULLTEXT INDEX `vocabulary_sense_gloss_gloss` (`gloss`),
CONSTRAINT `vocabulary_sense_gloss_language_id` FOREIGN KEY (`language_id`) REFERENCES `language` (`id`),
CONSTRAINT `vocabulary_sense_gloss_vocabulary_sense_id` FOREIGN KEY (`sense_id`) REFERENCES `vocabulary_sense` (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
I wonder, is there some way to optimize it? Or maybe should I change my database? I was trying to use fulltext search, but it's not much faster, and seems to work only on full terms, so its no use. Similiar story with using 'eat%' instead of '%eat%': it won't return what I want.
I tried to divide vocabulary_sense_gloss in two tables - one with english only terms, and other with the rest. Since users would use usually english anyway, it would make things faster, but I'm not sure if that's a good approach.
Also, I was trying to change VARCHAR to CHAR. It seemed to speed up execution time, though table size went up a lot.
Upvotes: 2
Views: 60
Reputation: 108706
This WHERE
clause has extremely poor performance.
(vsg.gloss like '%eat%' OR
vke.kanji_element like '%eat%' OR
vre.reading_element like '%eat%')
Why? First of all: column LIKE '%constant%'
requires the query engine to examine every possible value of column
. It can't possibly use an index because of the leading %
in the constant search term.
Second: the OR clause means the query planner has to scan the results three different times.
What are you going to do to improve this? It won't be easy. You need to figure out how to use column LIKE 'constant%'
search terms eliminating the leading %
from the constants.
Once you do that, you may be able to beat the triple scan of your vast joined result set with a construct like this
...
WHERE v.id IN
(SELECT sense_id AS id
FROM vocabulary_sense_gloss
WHERE gloss LIKE 'eat%'
UNION
SELECT vocabulary_id AS id
FROM vocabulary_kanji_element
WHERE kanji_element LIKE 'eat%'
UNION
SELECT vocabulary_id AS id
FROM vocabulary_reading_element
WHERE reading_element LIKE 'eat%'
)
This will pull the id
numbers of the relevant words directly, rather than from the result of a multiway JOIN. For this to be fast, your vocabulary_sense_gloss will need an index on (vocabulary_sense_gloss, sense_id)
. The other two tables will need similar indexes.
Upvotes: 3