MySQL: Optimizing query for records within date range

Question

I have a table (logs) that has the following columns (there are others, but these are the important ones):

id (PK, int)
Timestamp (datetime) (index)
Duration (int)

Basically this is a record for an event that starts at a time and ends at a time. This table currently has a few hundred thousand rows in it. I expect it to grow to millions. For the purpose of speeding up queries, I have added another column and precomputed values:

EndTime (datetime) (index)

To calculate EndTime I have added the number of seconds in Duration to the Timestamp field.

Now what I want to do is run a query where the result counts the number of rows where the start (Timestamp) and end times (EndTime) fall outside of a certain point in time. I then want to run this query for every second for a large timespan (such as a year). I would also like to count the number of rows that start on a particular point in time, and end at a particular point in time.

I have created the following query:

SELECT 
    `dates`.`date`, 
    COUNT(*) AS `total`, 
    SUM(IF(`dates`.`date`=`logs`.`Timestamp`, 1, 0)) AS `new`,
    SUM(IF(`dates`.`date`=`logs`.`EndTime`, 1, 0)) AS `dropped` 
FROM 
    `logs`,
    (SELECT 
        DATE_ADD("2010-04-13 09:45:00", INTERVAL `number` SECOND) AS `date` 
        FROM numbers LIMIT 120) AS dates
WHERE dates.`date` BETWEEN `logs`.`Timestamp` AND `logs`.`EndTime` 
GROUP BY `dates`.`date`;

Note that the numbers table is strictly for easily enumerating a date range. It is a table with one column, number, and contains the values 1, 2, 3, 4, 5, etc...

This gives me exactly what I am looking for... a table with 4 columns:

date
total (the total rows that start and end outside the current point in time)
new (rows that start at this point in time)
dropped (rows that end at this point in time)

The trouble is, this query can take a significant amount of time to execute. To go through 120 seconds (as shown in the query), it takes about 10 seconds. I suspect that this is about as fast as I am going to get it, but I thought I would ask here if anyone had any ideas for improving the performance of this query.

Any suggestions would be most helpful. Thank you for your time.

Edit: I have indexes on Timestamp and EndTime.

The output of EXPLAIN on my query:

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"PRIMARY";"";"ALL";NULL;NULL;NULL;NULL;"120";"Using temporary; Using filesort"
"1";"PRIMARY";"logs";"ALL";"Timestamp,EndTime";NULL;NULL;NULL;"296159";"Range checked for each record (index map: 0x6)"
"2";"DERIVED";"numbers";"index";NULL;"PRIMARY";"4";NULL;"35546940";"Using index"

When I run analyze on my logs table, it says status OK.

outis · Accepted Answer

Note in the EXPLAIN output that the join type for the logs table is "ALL" and the key is NULL, which means a full table scan is scheduled. The "Range checked for each record" message means that MySQL uses the range access method on logs after examining column values from somewhere else in the result. I take this to mean that once dates has been created, MySQL can perform a ranged join on logs using the second and third indices (likely those on Timestamp and EndTime) rather than performing a full table scan. If you only have indices on Timestamp and EndTime separately, try adding an index on both, which might result in a more efficient join type (e.g. index_merge rather than range):

CREATE INDEX `start_end` ON `logs` (`Timestamp`, `EndTime`);

I believe (though could easily be wrong) that other items in the query plan either aren't really a concern or can't be eliminated. The filesort, as an example of the latter, is likely due to the GROUP BY. In other words, this is likely the extent of what you can do with this particular query, though radically different queries or approaches that address table storage format are still possibly more efficient.

MySQL: Optimizing query for records within date range

Answers (2)

Related Questions