Martin AJ
Martin AJ

Reputation: 6697

Can MySQL use Indexes when there is OR between conditions?

I have two queries plus its own EXPLAIN's results:

One:

SELECT * 
FROM notifications 
WHERE id = 5204 OR seen = 3

enter image description here

Benchmark (for 10,000 rows): 0.861


Two:

SELECT h.* FROM ((SELECT n.* from notifications n WHERE id = 5204) 
                    UNION ALL
                 (SELECT n.* from notifications n WHERE seen = 3)) h 

enter image description here

Benchmark (for 10,000 rows): 2.064


The result of two queries above is identical. Also I have these two indexes on notifications table:

notifications(id) -- this is PK
notification(seen)

As you know, OR usually prevents effective use of indexes, That's why I wrote second query (by UNION). But after some tests I figured it out which still using OR is much faster that using UNION. So I'm confused and I really cannot choose the best option in my case.

Based on some logical and reasonable explanations, using union is better, but the result of benchmark says using OR is better. May you please help me should I use which approach?

Upvotes: 8

Views: 3164

Answers (4)

Rick James
Rick James

Reputation: 142296

(The Question is quite old, but I feel the need to point out some flaws -- both then and now.)

  • The two queries results are not necessarily identical. The OR will avoid any dups; the UNION ALL will [potentially] show a row twice (once for id=5204, once for seen=3.

  • Newer versions of MySQL will better optimize the UNION ALL by avoiding the need for the temp table to collect the results of each sub-query. (This optimization has limitations. UNION DISTINCT continues to require a temp table.) (Notice "Using temporary")

  • "Index merge union" happens so rarely, it is hard to predict when it is used. "Index merge intersect" almost always indicates that a composite index would make the query faster.

  • SELECT * is part of the slow-down. It may be better to do something like the following. It fetched ids more efficiently, the reaches back into the table to get the potentially bulky "*". If there are big columns, it may explain why UNION was so much slower.

      SELECT *
          FROM ( fetch just `id` from the OR or UNION or whatever ) AS x
          JOIN tbl USING (id);
    

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425401

index_merge, as the name suggests, combines the primary keys of two indexes using the Sort Merge Join or Sort Merge Union for AND and OR conditions, appropriately, and then looks up the rest of the values in the table by PK.

For this to work, conditions on both indexes should be so that each index would yield primary keys in order (your conditions are).

You can find the strict definition of the conditions in the docs, but in a nutshell, you should filter by all parts of the index with an equality condition, plus possibly <, =, or > on the PK.

If you have an index on (col1, col2, col3), this should be col1 = :val1 AND col2 = :val2 AND col3 = :val3 [ AND id > :id ] (the part in the square brackets is not necessary).

The following conditions won't work:

col1 = :val1 -- you omit col2 and col3

col1 = :val1 AND col2 = :val2 AND col3 > :val3 -- you can only use equality on key parts

As a free side effect, your output is sorted by id.

You could achieve the similar results using this:

SELECT  *
FROM    (
        SELECT  5204 id
        UNION ALL
        SELECT  id
        FROM    mytable
        WHERE   seen = 3
                AND id <> 5204
        ) q
JOIN    mytable m
ON      m.id = q.id

, except that in earlier versions of MySQL the derived table would have to be materialized which would definitely make the query performance worse, and your results would not have been ordered by id anymore.

In short, if your query allows index_merge(union), go for it.

Upvotes: 5

jkavalik
jkavalik

Reputation: 1316

The answer is contained in your question. The EXPLAIN output for OR says Using union(PRIMARY, seen) - that means that the index_merge optimization is being used and the query is actually executed by unioning results from the two indexes.

So MySQL can use index in some cases and it does in this one. But the index_merge is not always available or is not used because the statistics of the indexes say it won't be worth it. In those cases OR may be a lot slower than UNION (or not, you need to always check both versions if you are not sure).

In your test you "got lucky" and MySQL did the right optimization for you automatically. It is not always so.

Upvotes: 4

John Bollinger
John Bollinger

Reputation: 180286

The query plan for the OR case appears to indicate that MySQL is indeed using indexes, so evidently yes, it can do, at least in this case. That seems entirely reasonable, because there is an index on seen, and id is the PK.

Based on some logical and reasonable explanations, using union is better, but the result of benchmark says using OR is better.

If "logical and reasonable explanations" are contradicted by reality, then it is safe to assume that the logic is flawed or the explanations are wrong or inapplicable. Performance is notoriously difficult to predict; performance testing is essential where speed is important.

May you please help me should I use which approach?

You should use the one that tests faster on input that adequately models that which the program will see in real use.

Note also, however, that your two queries are not semantically equivalent: if the row with id = 5204 also has seen = 3 then the OR query will return it once, but the UNION ALL query will return it twice. It is pointless to choose between correct code and incorrect code on any basis other than which one is correct.

Upvotes: 11

Related Questions