Reputation: 8806
EXPLAIN SELECT *
FROM
events AS e
WHERE EXISTS (
SELECT * FROM dissidentList
WHERE actor_id IN (e.source_actor_id, e.target_actor_id)
AND e.event_date BETWEEN start_date AND end_date)
AND EXISTS (
SELECT * FROM governmentList
WHERE actor_id IN (e.source_actor_id, e.target_actor_id)
AND e.event_date BETWEEN start_date AND end_date);
I have indexes (MUL) on everything that appears in the query, including
events
-----------------------
event_date
source_actor_id
target_actor_id
dissidentList / governmentList
--------------------------
actor_id
start_date
end_date
However, nothing is being used. Why does this happen?
+----+--------------------+--------------------+------+---------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+--------------------+------+---------------------+------+---------+------+----------+----------+-------------+
| 1 | PRIMARY | e | ALL | NULL | NULL | NULL | NULL | 19874715 | 100.00 | Using where |
| 3 | DEPENDENT SUBQUERY | anh_governmentList | ALL | start_date,end_date | NULL | NULL | NULL | 217890 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | anh_dissidentList | ALL | start_date,end_date | NULL | NULL | NULL | 47452 | 100.00 | Using where |
+----+--------------------+--------------------+------+---------------------+------+---------+------+----------+----------+-------------+
Upvotes: 0
Views: 100
Reputation: 108410
Q: Why does this happen?
A: Likely the OR condition is disabling MySQL from considering using the index on the actor_id
column. Note that the foo IN (a,b)
is shorthand for (foo = a OR foo = b)
It is possible that MySQL might make use of a composite index, for example:
... ON dissidentList (start_date, end_date, actor_id)
... ON governmentList (start_date, end_date, actor_id)
The EXPLAIN output would show "Using index" if MySQL uses a covering index.
For table events
, there are no sargable predicates. MySQL has to evaluate the EXISTS
predicates for every row in the events
table. (Also, the query is returning every column in the table, and there's no "covering" index that satisfies that SELECT list.)
FOLLOWUP
Q: What to do in this situation?
A: If the addition of covering indexes on dissidentList and governmentList don't improve performance of the query, it may be possible to produce an equivalent result using JOIN operations. This might give better performance, but it will depend on the execution plan, and the cardinality.
Getting an equivalent result with a JOIN operation would require that the events
table have a PRIMARY KEY (or a UNIQUE KEY on non-null column(s)). Because the JOIN operation can return multiple matches to events
, we'd need to add a GROUP BY to eliminate the duplicates, and we'd use the PRIMARY KEY or UNIQUE KEY columns for that.
Assuming that id
is the name of the PRIMARY KEY column...
SELECT e.*
FROM events e
JOIN dissidentList f
ON f.start_date <= e.event_date
AND f.end_date >= e.event_date
AND f.actor_id IN (e.source_actor_id, e.target_actor_id)
JOIN governmentList g
ON g.start_date <= e.event_date
AND g.end_date >= e.event_date
AND g.actor_id IN (e.source_actor_id, e.target_actor_id)
GROUP BY e.id
Suitable covering indexes for both dissidentList
and governmentList
would improve performance:
... ON dissidentList (start_date, end_date, actor_id)
... ON governmentList (start_date, end_date, actor_id)
(We'd want EXPLAIN to show a "range" scan operation, and "Using index" in the extra column.)
MySQL may be able to make use of the PRIMARY KEY index on events
to optimize the GROUP BY operation. If not, and if there a lot of matching rows in dissidentList
and governmentList
, this query could produce a very large intermediate result. If the EXPLAIN for this query shows 'Using temporary; Using filesort', this may be less efficient than the original. If the return from the query is a very small subset of events, it may be more efficient.
Upvotes: 2