林鼎棋
林鼎棋

Reputation: 2075

The best way to convert time zone efficiently in MYSQL query

My table 'my_logs' have about 20,000,000 records, and I want to find out how many logs I have in each date within a few days.

I want to have a result like

+------------+---------+ | date | count | +------------+---------+ | 2016-07-01 | 1623 | | 2016-07-02 | 1280 | | 2016-07-03 | 2032 | +------------+---------+

This query below only take me milliseconds to finish, that's good

SELECT  DATE_FORMAT(created_at, '%Y-%m-%d') as date,
        COUNT(*) as count
    FROM  my_logs
    WHERE  created_at BETWEEN '2016-07-01' AND '2016-07-04'
    GROUP BY  DATE_FORMAT(created_at, '%Y-%m-%d')

The Explain of query:

+------------+---------+-------+-----------------------------+ |select_type | table | type | possible_keys | +------------+---------+-------+-----------------------------+ | SIMPLE | my_logs| index | index_my_logs_on_created_at | +------------+---------+-------+-----------------------------+

+-----------------------------+---------+----------+ | key | key_len | rows | +-----------------------------+---------+----------+ | index_my_logs_on_created_at | 10 | 23458462 | +-----------------------------+---------+----------+

+-----------------------------------------------------------+ | Extra | +-----------------------------------------------------------+ | Using where; Using index; Using temporary; Using filesort | +-----------------------------------------------------------+

However, I need to convert the timezone of each record to fit the time in my country, and I need to group by the 'Date' information, so I need to convert the column itself.

Both

SELECT  COUNT(*)
    FROM  my_logs
    WHERE  DATE_ADD(created_at, INTERVAL 8 HOUR) BETWEEN '2016-07-01' AND '2016-07-04'
    GROUP BY  DATE_FORMAT(DATE_ADD(created_at, INTERVAL 8 HOUR), '%Y-%m-%d')

and

SELECT  COUNT(*)
    FROM  my_logs
    WHERE  CONVERT_TZ(created_at, "+00:00", "+08:00") BETWEEN '2016-07-01' AND '2016-07-04'
    GROUP BY  DATE_FORMAT(CONVERT_TZ(created_at, "+00:00", "+08:00"),

'%Y-%m-%d')

take me about 12s to finish the query, it is unbearable slow!!

(The Explain is the same as the query in the top)


I think it is common problem but I can't find a good way to deal with it, does anyone has a more efficient way to do it? Thanks!

Upvotes: 6

Views: 3348

Answers (1)

Rick James
Rick James

Reputation: 142316

Which datatype, TIMESTAMP vs. DATETIME, did you use? (But, I'll ignore that.)

Do not "hide" an indexed column (created_at) inside any function (CONVERT_TZ()). It makes it so that the WHERE clause cannot use the index and must scan the table instead. This fix is simple:

WHERE created_at >= '2016-07-01' - INTERVAL 8 HOUR
  AND created_at  < '2016-07-04' - INTERVAL 8 HOUR

(or use CONVERT_TZ). Note that I also fixed the bug wherein you included midnight from the 4th. Note: Even + INTERVAL... is effectively a function.

Expressions in the SELECT and the GROUP BY are far less critical to performance.

Upvotes: 5

Related Questions