Mozzis
Mozzis

Reputation: 419

How can I improve the speed of this SQL query?

We have two devices collecting data at roughly 30-second intervals. The devices are located at two widely-spaced sites. The absolute time of each collection for each site can vary +/- 30 seconds. Occasionally, a site will go offline for various reasons. The data from each device represents a different kind of measurement, e.g. temperature from device1 and humidity from device2. A process records the data from device1 and device2 into separate tables in a SQL Server 2012 Express database running on a server separate from each device.

It is desired to present the data from both devices correlated into records which will contain columns with the value for site1 for a particular date/time, combined with the data for site2 if any is available. User programs will then request recordsets for a specified date/time range. To this end, I constructed the following SP:

ALTER PROCEDURE [db_datareader].[DataJoinDateRange] 
@DateFrom DateTime = '2014-05-15 15:10:24.000', 
@DateTo DateTime = '2014-06-15 15:10:24.000' 
AS
BEGIN
SET NOCOUNT ON;
WITH site1(id, date_time, dataval)
AS
(
    SELECT * 
    FROM site1_data 
    WHERE site1_data.date_time BETWEEN @DateFrom AND @DateTo
),
site2(id, date_time, datavaql)
AS
(
    SELECT *
    FROM site2_data
    WHERE site2_data.date_time BETWEEN @DateFrom AND @DateTo
)
SELECT * from site1 site1_res
INNER JOIN (select id, date_time, data_val) site2_res
on ABS(DATEDIFF("SECOND", site1_res.date_time, site_2_res.date_time)) < 30
END

The intent is to first select out records in the desired date/time range, and then join records from site1 to those in site2 which are within the +/- 30 sec. range. The resulting recordset will contain data from both devices, or nulls when no corresponding record exists.

This seems to work: records with the desired form are output and correspond to the correct records in each table. But the execution is very slow. A query over a date range of a few weeks takes about 1 minute and 30 seconds. Site1 contains about 5000 records in this date range, while Site2 contains only 1 record. A SELECT query on the date range only for each table executes in well under a second.

I have never delved very deeply into SQL before, but our small group has no one else at this time to do this task. Can anyone give me an idea of the proper way to do this, or at least how to accelerate this SP?

Upvotes: 1

Views: 118

Answers (2)

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

You can try to improve your solution by making better use of the index on the date_time column.

ABS(S1 - S2) < 30

is equivalent to

ABS(S2 - S1) < 30

<=>

-30 < S2 - S1 < 30

<=>

S2 - S1 < 30
AND
S2 - S1 > -30

<=>

S2 < S1 + 30
AND
S2 > S1 - 30

You don't really need the first CTE, though it should not hurt. But, the WHERE clause inside the CROSS APPLY is better to write like this. Also, you should use OUTER APPLY instead of CROSS APPLY if you want to see data from site1, which doesn't have any corresponding data from site2. Now the site2.date_time is not inside the function call and optimizer can use index on this column.

ALTER PROCEDURE [dbo].[SPJoinDateRange]
    @DateFrom DateTime = '2014-05-01 15:10:24.000', 
    @DateTo   DateTime = '2014-07-31 15:10:00.000'
AS
BEGIN
    SET NOCOUNT ON;

    SELECT
        site1_data.id AS id1
        ,site1_data.date_time AS date_time1
        ,site1_data.data_val1
        ,CA_site2.id2
        ,CA_site2.date_time2
        ,CA_site2.data_val2
    FROM
        site1_data
        OUTER APPLY
        (
            SELECT
                site2_data.id as id2
                ,site2_data.date_time as date_time2
                ,site2_data.data_val2
            FROM
                site2_data
            WHERE
                site2.date_time BETWEEN @DateFrom AND @DateTo
                AND site2.date_time < DATEADD(second, +30, site1_data.date_time)
                AND site2.date_time > DATEADD(second, -30, site1_data.date_time)
        ) AS CA_site2
    WHERE
        site1_data.date_time BETWEEN @DateFrom AND @DateTo
    ;

END

It will work even faster if you can add an extra column that would contain your timestamps rounded to the nearest 30 seconds. Or round existing values in place if you don't need the precise timestamps.

If we add a column called date_time_rounded, which contains original timestamp rounded to 30 seconds, create index on it, then the query will look like this:

ALTER PROCEDURE [dbo].[SPJoinDateRange]
    @DateFrom DateTime = '2014-05-01 15:10:24.000', 
    @DateTo   DateTime = '2014-07-31 15:10:00.000'
AS
BEGIN
    SET NOCOUNT ON;

    SELECT
        site1_data.id AS id1
        ,site1_data.date_time AS date_time1
        ,site1_data.data_val1
        ,site2_data.id AS id2
        ,site2_data.date_time AS date_time2
        ,site2_data.data_val2
    FROM
        site1_data
        LEFT JOIN site2_data ON site2_data.date_time_rounded = site1_data.date_time_rounded
    WHERE
        site1_data.date_time BETWEEN @DateFrom AND @DateTo
    ;

END

To round the date_time to nearest 30 seconds you can use something like this:

DATEADD(second, 30 * ROUND(DATEDIFF(second, '20010101', date_time)/30.0, 0), '20010101')

It calculates number of seconds from 2001-01-01 to the given date_time, divides them by 30, rounds the result to integer, multiplies result by 30, adds this number of second to 2001-01-01.

Run this few times to see how it works:

SELECT 
GETDATE() as original, 
DATEADD(second, 30 * ROUND(DATEDIFF(second, '20010101', GETDATE())/30.0, 0), '20010101') AS rounded

Upvotes: 1

Mozzis
Mozzis

Reputation: 419

I found this article elsewhere, and it was quite helpful. As a result, I altered the SP to what follows:

ALTER PROCEDURE [dbo].[SPJoinDateRange]
@DateFrom DateTime = '2014-05-01 15:10:24.000', 
@DateTo   DateTime = '2014-07-31 15:10:00.000'
AS
BEGIN
SET NOCOUNT ON;
WITH site1(id, date_time, data_val1)
AS
(
    SELECT * 
    FROM site1_data 
    WHERE site1_data.date_time BETWEEN @DateFrom AND @DateTo
)
SELECT * FROM site1
CROSS APPLY 
(
    SELECT id as id1, date_time as date_time1, data_val2
    FROM site2_data AS site2
    WHERE site2.date_time BETWEEN @DateFrom AND @DateTo 
    AND 
    ABS(DATEDIFF("SECOND", site1.date_time, site2.date_time)) < 30
) 
AS result
END

The result time for this query is 6 sec. (compared to 90 sec. for the previous version.) This may still be much slower than is possible; my next task would ideally be to understand just why this approach is faster. The laconic answer (and link) from Sean Lange no doubt furnishes some clues. Of course I will have to defer that and push on to the next task in our initial implementation.

My thanks to all who so rapidly responded to my question.

Upvotes: 0

Related Questions