Reputation: 2075
My table 'my_logs' have about 20,000,000 records, and I want to find out how many logs I have in each date within a few days.
I want to have a result like
+------------+---------+
| date | count |
+------------+---------+
| 2016-07-01 | 1623 |
| 2016-07-02 | 1280 |
| 2016-07-03 | 2032 |
+------------+---------+
This query below only take me milliseconds to finish, that's good
SELECT DATE_FORMAT(created_at, '%Y-%m-%d') as date,
COUNT(*) as count
FROM my_logs
WHERE created_at BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(created_at, '%Y-%m-%d')
The Explain of query:
+------------+---------+-------+-----------------------------+
|select_type | table | type | possible_keys |
+------------+---------+-------+-----------------------------+
| SIMPLE | my_logs| index | index_my_logs_on_created_at |
+------------+---------+-------+-----------------------------+
+-----------------------------+---------+----------+
| key | key_len | rows |
+-----------------------------+---------+----------+
| index_my_logs_on_created_at | 10 | 23458462 |
+-----------------------------+---------+----------+
+-----------------------------------------------------------+
| Extra |
+-----------------------------------------------------------+
| Using where; Using index; Using temporary; Using filesort |
+-----------------------------------------------------------+
However, I need to convert the timezone of each record to fit the time in my country, and I need to group by the 'Date' information, so I need to convert the column itself.
Both
SELECT COUNT(*)
FROM my_logs
WHERE DATE_ADD(created_at, INTERVAL 8 HOUR) BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(DATE_ADD(created_at, INTERVAL 8 HOUR), '%Y-%m-%d')
and
SELECT COUNT(*)
FROM my_logs
WHERE CONVERT_TZ(created_at, "+00:00", "+08:00") BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(CONVERT_TZ(created_at, "+00:00", "+08:00"),
'%Y-%m-%d')
take me about 12s to finish the query, it is unbearable slow!!
(The Explain is the same as the query in the top)
I think it is common problem but I can't find a good way to deal with it, does anyone has a more efficient way to do it? Thanks!
Upvotes: 6
Views: 3348
Reputation: 142316
Which datatype, TIMESTAMP
vs. DATETIME
, did you use? (But, I'll ignore that.)
Do not "hide" an indexed column (created_at
) inside any function (CONVERT_TZ()
). It makes it so that the WHERE
clause cannot use the index and must scan the table instead. This fix is simple:
WHERE created_at >= '2016-07-01' - INTERVAL 8 HOUR
AND created_at < '2016-07-04' - INTERVAL 8 HOUR
(or use CONVERT_TZ
). Note that I also fixed the bug wherein you included midnight from the 4th. Note: Even + INTERVAL...
is effectively a function.
Expressions in the SELECT
and the GROUP BY
are far less critical to performance.
Upvotes: 5