Using combined indexes in MySQL

Question

I have a translation table (MySQL MyISAM) for words (empty right now but will get real big once the words are inserted).

id_word
lang_original (the language from the original word) VARCHAR(2)
lang_target (the language from the translated word) VARCHAR(2)
word (the word itself) VARCHAR(50)
translation (the translation) VARCHAR(50)

There are only 2 kinds of SELECTs to request information on that table:

by defining the original language

SELECT translation FROM table WHERE lang_original='en' AND lang_target='de' AND word = 'house'
//this will return 'Haus'

by letting the script auto-detect the original language

SELECT lang_original,translation FROM table WHERE lang_target='de' AND word = 'house'
//this will return 'en','Haus'

Normally i would have these fields with separate indexes:

id_word
lang_original
lang_target
word

But then, I just read about combined indexes in another thread, but I still have some doubts on how to use them.

Should I only have 3 indexes, like this?
- id_word (primary / auto increment)
- combined: lang_target + word
- combined: lang_original + lang_target + word
Should those combined indexes be UNIQUE or INDEX type? Why?
Do I also need to set separated indexes for lang_original, lang_target and word despite the fact that they won't ever be filtered by their own?

Update 1 fixed = instead of LIKE

Wiseguy · Accepted Answer

As I mentioned above, using EXPLAIN will show you how queries are executed and which indexes are or are not being used.

1) Should I only have 3 indexes, like this?

id_word (primary / auto increment)

combined: lang_target + word

combined: lang_original + lang_target + word

Having those composite indexes would work. There's a lot of overlap between the two composite indexes, so maybe you could find a way to combine them like this:

(lang_target, word(10), lang_original)

I'd suggest using a sane prefix size for the word column so they're a set length (say, 10). The vast majority of words are different within the first 10 chars, so circumstances where it has to read from disk because they're identical through 10 chars will be rare. Prefix length is a balance between index size and execution speed. A bigger prefix will encounter fewer prefix collisions, but it will be a more memory-consuming index. On the other hand, a smaller prefix will save memory but could result in more prefix collisions (thus having to go to disk to make a final determination between a few records). Play with what works for you. Maybe it's 5, maybe it's 15.

Should those combined indexes be UNIQUE or INDEX type? Why?

If (lang_target, word, lang_original) will be a unique combination (which seems to be the case), go ahead and use the UNIQUE constraint, but only if you're not chopping word with a prefix (in which case multiple records with the same partial word value might exist). As an index, I think they'll still be functionally equivalent.

Do I also need to set separated indexes for lang_original, lang_target and word despite the fact that they won't ever be filtered by their own?

Columns in a composite index are used left to right. In (lang_target, word(10), lang_original), the index can be used if you're only querying with a condition on lang_target, but if you have a condition on lang_original, it couldn't be used unless you're also querying on both lang_target and word. So, only create other indexes for word and/or lang_original if you need to be querying without a lang_target condition. See this doc page on multiple column indexes for more explanation on that.

One other thought: If your lang_ columns are always going to be just two characters, make them CHAR instead of VARCHAR. For a column that size, VARCHAR takes an extra byte with each value to specify data length, whereas it's a fixed length with CHAR. With millions of records, every byte matters for size and speed.

Using combined indexes in MySQL

by defining the original language

by letting the script auto-detect the original language

Answers (2)

Related Questions