Reputation: 13110
I have the following query:
SELECT COUNT(*)
FROM datum d
JOIN datum_type dt
ON dt.datum_id = d.id
AND dt.type_id = '3'
WHERE d.added_time >= DATE_FORMAT(CURDATE(), '%Y-%m')
AND d.added_time < DATE_FORMAT(CURDATE() + INTERVAL 1 MONTH, '%Y-%m')
There are indexes on d.id (PRIMARY), d.added_time, dt.datum_id and dt.type_id
The current explain plan is:
+----+-------------+-------+--------+--------------------+---------+---------+-------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+---------+---------+-------------+--------+-------------+
| 1 | SIMPLE | dt | ref | type_id,datum_id | type_id | 1 | const | 602628 | |
| 1 | SIMPLE | d | eq_ref | PRIMARY,added_time | PRIMARY | 8 | dt.datum_id | 1 | Using where |
+----+-------------+-------+--------+--------------------+---------+---------+-------------+--------+-------------+
As we have datum records for quite some time, it appears to be joining the type in first using the datum.id PRIMARY and then scanning each joined row to see if the datum.added_time is within the range.
I tried using the added_time index but the explain plan was:
+----+-------------+-------+-------+------------------+------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------+------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | d | index | added_time | added_time | 4 | NULL | 6195194 | Using where; Using index |
| 1 | SIMPLE | dt | ref | type_id,datum_id | datum_id | 8 | d.id | 1 | Using where |
+----+-------------+-------+-------+------------------+------------+---------+------+---------+--------------------------+
Which takes almost as long as there are so many datum_types of different datum_type.type_id within the datum.added_time range.
Is there some combination of index that may speed this up?
Upvotes: 1
Views: 53
Reputation: 1269693
Let me assume that added_time
is datetime
or date
. Then, you should express the conditions as strings. Instead, use date
constants:
SELECT COUNT(*)
FROM datum d JOIN
datum_type dt
ON dt.datum_id = d.id AND
dt.type_id = '3'
WHERE d.added_time >= DATE_SUB(CURDATE(), INTERVAL DAY(CURDATE()) - 1 DAY) AND
d.added_time < DATE_ADD(DATE_SUB(CURDATE(), INTERVAL DAY(CURDATE()) - 1 DAY), INTERVAL 1 MONTH);
This can take advantage of an index on datum(added_time, id)
and datum_type(datum_id, type_id)
.
If there are no duplicate records (for the count) coming from datum_type
, I would suggest that you rewrite the query as:
SELECT COUNT(*)
FROM datum d
WHERE d.added_time >= DATE_SUB(CURDATE(), INTERVAL DAY(CURDATE()) - 1 DAY) AND
d.added_time < DATE_ADD(DATE_SUB(CURDATE(), INTERVAL DAY(CURDATE()) - 1 DAY), INTERVAL 1 MONTH) AND
EXISTS (SELECT 1
FROM datum_type dt
WHERE dt.datum_id = d.id AND dt.type_id = '3'
);
If type_id
is an integer, then you should drop the single quotes. Mixing different data types in SQL can confuse the optimize and prevent the use of indexes.
Upvotes: 1