TSQL Last Record Efficiency Cursor, SubQuery, or CTE

Question

Consider the following query...

SELECT
     *
    ,CAST(
            (CurrentSampleDateTime - PreviousSampleDateTime) AS FLOAT
        ) * 24.0 * 60.0 AS DeltaMinutes
FROM    
(   
    SELECT
         C.SampleDateTime AS CurrentSampleDateTime
        ,C.Location
        ,C.CurrentValue
        ,(
            SELECT TOP 1 
                Previous.SampleDateTime
            FROM Samples AS Previous
            WHERE 
                    Previous.Location = C.Location
                AND Previous.SampleDateTime < C.SampleDateTime
            ORDER BY Previous.SampleDateTime DESC       
        ) AS PreviousSampleDateTime
    FROM Samples AS C
) AS TempResults

Assuming all things being equal such as indexing, etc is this the most efficient way of achieving the above results? That is using a SubQuery to retrieve the last record?

Would I be better off creating a cursor that orders by Location, SampleDateTime and setting up variables for CurrentSampleDateTime and PreviousSampleDateTime...setting the Previous to the Current at the bottom of the while loop?

I'm not very good with CTE's is this something that could be accomplished more efficiently with a CTE? If so what would that look like?

I'm likely going to have to retrieve PreviousValue along with Previous SampleDateTime in order to get an average of the two. Does that change the results any.

Long story short what is the best/most efficient way of holding onto the values of a previous record if you need to use those values in calculations on the current record?

----UPDATE I should note that I have a clustered index on Location, SampleDateTime, CurrentValue so maybe that is what is affecting the results more than anything.

with 5,591,571 records my query (the one above) on average takes 3 mins and 20 seconds

The CTE that Joachim Isaksson below on average is taking 5 mins and 15 secs.

Maybe it's taking longer because it's not using the clustered index but is using the rownumber for the joins?

I started testing the cursor method but it's already at 10 minutes...so no go on that one.

I'll give it a day or so but think I will accept the CTE answer provided by Joachim Isaksson just because I found a new method of getting the last row.

Can anyone concur that it's the index on Location, SampleDateTime, CurrentValue that is making the subquery method faster?

I don't have SQL Server 2012 so can't test the LEAD/LAG method. I'd bet that would be quicker than anything I've tried assuming Microsoft implemented that efficiently. Probably just have to swap a pointer to a memory reference at the end of each row.

David Nhim · Accepted Answer

If you are using SQL Server 2012, you can use the LAG window function that retrieves the value of the specified column from the previous row. It returns null if there is no previous row.

SELECT 
 a.*,
 CAST((a.SampleDateTime - LAG(a.SampleDateTime) OVER(PARTITION BY a.location ORDER BY a.SampleDateTime ASC)) AS FLOAT) 
             * 24.0 * 60.0 AS DeltaMinutes
FROM samples a
ORDER BY
 a.location,
 a.SampleDateTime

You'd have to run some tests to see if it's faster. If you're not using SQL Server 2012 then at least this may give others an idea of how it can be done with 2012. I like @Joachim Isaksson 's answer using a CTE with a Row_Number()/Partition By for 2008 and 2005.

SQL Fiddle

Have you considered creating a temp table to use instead of a CTE or subquery? You can create indexes on the temp table that are more suited for the join on RowNumber.

CREATE TABLE #tmp (
  RowNumber INT,
  Location INT,
  SampleDateTime DATETIME,
  CurrentValue INT)
;

INSERT INTO #tmp
 SELECT 
  ROW_NUMBER() OVER (PARTITION BY Location 
                           ORDER BY SampleDateTime DESC) rn,
  Location,
  SampleDateTime,
  CurrentValue
 FROM Samples
;

CREATE INDEX idx_location_row ON #tmp(Location,RowNumber) INCLUDE (SampleDateTime,CurrentValue);

SELECT 
 a.Location,
 a.SampleDateTime,
 a.CurrentValue,
 CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT) * 24.0 * 60.0 AS DeltaMinutes
FROM #tmp a
LEFT JOIN #tmp b ON 
 a.Location = b.Location 
 AND b.RowNumber = a.RowNumber +1  
ORDER BY
 a.Location, 
 a.SampleDateTime

SQL Fiddle #2

TSQL Last Record Efficiency Cursor, SubQuery, or CTE

Answers (2)

Related Questions