Andres SK
Andres SK

Reputation: 10974

Using combined indexes in MySQL

I have a translation table (MySQL MyISAM) for words (empty right now but will get real big once the words are inserted).

There are only 2 kinds of SELECTs to request information on that table:

by defining the original language

SELECT translation FROM table WHERE lang_original='en' AND lang_target='de' AND word = 'house'
//this will return 'Haus'

by letting the script auto-detect the original language

SELECT lang_original,translation FROM table WHERE lang_target='de' AND word = 'house'
//this will return 'en','Haus'

Normally i would have these fields with separate indexes:

But then, I just read about combined indexes in another thread, but I still have some doubts on how to use them.

  1. Should I only have 3 indexes, like this?

    • id_word (primary / auto increment)
    • combined: lang_target + word
    • combined: lang_original + lang_target + word
  2. Should those combined indexes be UNIQUE or INDEX type? Why?

  3. Do I also need to set separated indexes for lang_original, lang_target and word despite the fact that they won't ever be filtered by their own?

Update 1 fixed = instead of LIKE

Upvotes: 1

Views: 533

Answers (2)

Wiseguy
Wiseguy

Reputation: 20873

As I mentioned above, using EXPLAIN will show you how queries are executed and which indexes are or are not being used.

1) Should I only have 3 indexes, like this?

  • id_word (primary / auto increment)
  • combined: lang_target + word
  • combined: lang_original + lang_target + word

Having those composite indexes would work. There's a lot of overlap between the two composite indexes, so maybe you could find a way to combine them like this:

(lang_target, word(10), lang_original)

I'd suggest using a sane prefix size for the word column so they're a set length (say, 10). The vast majority of words are different within the first 10 chars, so circumstances where it has to read from disk because they're identical through 10 chars will be rare. Prefix length is a balance between index size and execution speed. A bigger prefix will encounter fewer prefix collisions, but it will be a more memory-consuming index. On the other hand, a smaller prefix will save memory but could result in more prefix collisions (thus having to go to disk to make a final determination between a few records). Play with what works for you. Maybe it's 5, maybe it's 15.

Should those combined indexes be UNIQUE or INDEX type? Why?

If (lang_target, word, lang_original) will be a unique combination (which seems to be the case), go ahead and use the UNIQUE constraint, but only if you're not chopping word with a prefix (in which case multiple records with the same partial word value might exist). As an index, I think they'll still be functionally equivalent.

Do I also need to set separated indexes for lang_original, lang_target and word despite the fact that they won't ever be filtered by their own?

Columns in a composite index are used left to right. In (lang_target, word(10), lang_original), the index can be used if you're only querying with a condition on lang_target, but if you have a condition on lang_original, it couldn't be used unless you're also querying on both lang_target and word. So, only create other indexes for word and/or lang_original if you need to be querying without a lang_target condition. See this doc page on multiple column indexes for more explanation on that.

One other thought: If your lang_ columns are always going to be just two characters, make them CHAR instead of VARCHAR. For a column that size, VARCHAR takes an extra byte with each value to specify data length, whereas it's a fixed length with CHAR. With millions of records, every byte matters for size and speed.

Upvotes: 6

Andreas Wederbrand
Andreas Wederbrand

Reputation: 39951

As someone pointed out, don't use LIKE unless you need to.

Also, don't use UNIQUE unless the combination really is unique, it looks like there can only be one translation of each word so UNIQUE on (lang_target, word) should be okey but I think performance is better on non unique indexes, think that they don't need the unique check.

You only need one index for those two queries. Don't add more indexes than you need, each index comes with a small cost at insert and update.

(lang_target, word, lang_original). Order is important here as MySQL can use any left part of an index so both a query for *lang_target*, *lang_target + word* or *lang_target + word + lang_original* can use that single index.

Upvotes: 1

Related Questions