Heisenberg
Heisenberg

Reputation: 8806

MySQL does not use index for BETWEEN and IN condition

EXPLAIN SELECT *
FROM
  events AS e
  WHERE EXISTS (
        SELECT * FROM dissidentList
        WHERE actor_id IN (e.source_actor_id, e.target_actor_id)
              AND e.event_date BETWEEN start_date AND end_date)
    AND EXISTS (
        SELECT * FROM governmentList
        WHERE actor_id IN (e.source_actor_id, e.target_actor_id)
              AND e.event_date BETWEEN start_date AND end_date);

I have indexes (MUL) on everything that appears in the query, including

events
-----------------------
event_date
source_actor_id
target_actor_id

dissidentList / governmentList
--------------------------
actor_id
start_date
end_date

However, nothing is being used. Why does this happen?

+----+--------------------+--------------------+------+---------------------+------+---------+------+----------+----------+-------------+
| id | select_type        | table              | type | possible_keys       | key  | key_len | ref  | rows     | filtered | Extra       |
+----+--------------------+--------------------+------+---------------------+------+---------+------+----------+----------+-------------+
|  1 | PRIMARY            | e                  | ALL  | NULL                | NULL | NULL    | NULL | 19874715 |   100.00 | Using where |
|  3 | DEPENDENT SUBQUERY | anh_governmentList | ALL  | start_date,end_date | NULL | NULL    | NULL |   217890 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | anh_dissidentList  | ALL  | start_date,end_date | NULL | NULL    | NULL |    47452 |   100.00 | Using where |
+----+--------------------+--------------------+------+---------------------+------+---------+------+----------+----------+-------------+

Upvotes: 0

Views: 100

Answers (1)

spencer7593
spencer7593

Reputation: 108410

Q: Why does this happen?

A: Likely the OR condition is disabling MySQL from considering using the index on the actor_id column. Note that the foo IN (a,b) is shorthand for (foo = a OR foo = b)

It is possible that MySQL might make use of a composite index, for example:

 ... ON dissidentList (start_date, end_date, actor_id)
 ... ON governmentList (start_date, end_date, actor_id)

The EXPLAIN output would show "Using index" if MySQL uses a covering index.

For table events, there are no sargable predicates. MySQL has to evaluate the EXISTS predicates for every row in the events table. (Also, the query is returning every column in the table, and there's no "covering" index that satisfies that SELECT list.)


FOLLOWUP

Q: What to do in this situation?

A: If the addition of covering indexes on dissidentList and governmentList don't improve performance of the query, it may be possible to produce an equivalent result using JOIN operations. This might give better performance, but it will depend on the execution plan, and the cardinality.

Getting an equivalent result with a JOIN operation would require that the events table have a PRIMARY KEY (or a UNIQUE KEY on non-null column(s)). Because the JOIN operation can return multiple matches to events, we'd need to add a GROUP BY to eliminate the duplicates, and we'd use the PRIMARY KEY or UNIQUE KEY columns for that.

Assuming that id is the name of the PRIMARY KEY column...

SELECT e.*
  FROM events e
  JOIN dissidentList f
    ON f.start_date <= e.event_date
   AND f.end_date   >= e.event_date
   AND f.actor_id   IN (e.source_actor_id, e.target_actor_id)
  JOIN governmentList g
    ON g.start_date <= e.event_date
   AND g.end_date   >= e.event_date
   AND g.actor_id   IN (e.source_actor_id, e.target_actor_id)
 GROUP BY e.id

Suitable covering indexes for both dissidentList and governmentList would improve performance:

... ON dissidentList (start_date, end_date, actor_id)
... ON governmentList (start_date, end_date, actor_id)

(We'd want EXPLAIN to show a "range" scan operation, and "Using index" in the extra column.)

MySQL may be able to make use of the PRIMARY KEY index on events to optimize the GROUP BY operation. If not, and if there a lot of matching rows in dissidentList and governmentList, this query could produce a very large intermediate result. If the EXPLAIN for this query shows 'Using temporary; Using filesort', this may be less efficient than the original. If the return from the query is a very small subset of events, it may be more efficient.

Upvotes: 2

Related Questions