Reputation: 10974
I have a translation table (MySQL MyISAM) for words (empty right now but will get real big once the words are inserted).
There are only 2 kinds of SELECTs to request information on that table:
SELECT translation FROM table WHERE lang_original='en' AND lang_target='de' AND word = 'house'
//this will return 'Haus'
SELECT lang_original,translation FROM table WHERE lang_target='de' AND word = 'house'
//this will return 'en','Haus'
Normally i would have these fields with separate indexes:
But then, I just read about combined indexes in another thread, but I still have some doubts on how to use them.
Should I only have 3 indexes, like this?
Should those combined indexes be UNIQUE or INDEX type? Why?
Do I also need to set separated indexes for lang_original, lang_target and word despite the fact that they won't ever be filtered by their own?
Update 1 fixed = instead of LIKE
Upvotes: 1
Views: 533
Reputation: 20873
As I mentioned above, using EXPLAIN
will show you how queries are executed and which indexes are or are not being used.
1) Should I only have 3 indexes, like this?
- id_word (primary / auto increment)
- combined: lang_target + word
- combined: lang_original + lang_target + word
Having those composite indexes would work. There's a lot of overlap between the two composite indexes, so maybe you could find a way to combine them like this:
(lang_target, word(10), lang_original)
I'd suggest using a sane prefix size for the word
column so they're a set length (say, 10). The vast majority of words are different within the first 10 chars, so circumstances where it has to read from disk because they're identical through 10 chars will be rare. Prefix length is a balance between index size and execution speed. A bigger prefix will encounter fewer prefix collisions, but it will be a more memory-consuming index. On the other hand, a smaller prefix will save memory but could result in more prefix collisions (thus having to go to disk to make a final determination between a few records). Play with what works for you. Maybe it's 5, maybe it's 15.
Should those combined indexes be UNIQUE or INDEX type? Why?
If (lang_target, word, lang_original)
will be a unique combination (which seems to be the case), go ahead and use the UNIQUE
constraint, but only if you're not chopping word
with a prefix (in which case multiple records with the same partial word
value might exist). As an index, I think they'll still be functionally equivalent.
Do I also need to set separated indexes for lang_original, lang_target and word despite the fact that they won't ever be filtered by their own?
Columns in a composite index are used left to right. In (lang_target, word(10), lang_original)
, the index can be used if you're only querying with a condition on lang_target
, but if you have a condition on lang_original
, it couldn't be used unless you're also querying on both lang_target
and word
. So, only create other indexes for word
and/or lang_original
if you need to be querying without a lang_target
condition. See this doc page on multiple column indexes for more explanation on that.
One other thought:
If your lang_
columns are always going to be just two characters, make them CHAR
instead of VARCHAR
. For a column that size, VARCHAR
takes an extra byte with each value to specify data length, whereas it's a fixed length with CHAR
. With millions of records, every byte matters for size and speed.
Upvotes: 6
Reputation: 39951
As someone pointed out, don't use LIKE unless you need to.
Also, don't use UNIQUE unless the combination really is unique, it looks like there can only be one translation of each word so UNIQUE on (lang_target, word) should be okey but I think performance is better on non unique indexes, think that they don't need the unique check.
You only need one index for those two queries. Don't add more indexes than you need, each index comes with a small cost at insert and update.
(lang_target, word, lang_original). Order is important here as MySQL can use any left part of an index so both a query for *lang_target*, *lang_target + word* or *lang_target + word + lang_original* can use that single index.
Upvotes: 1