Reputation: 1999
Consider the following query...
SELECT
*
,CAST(
(CurrentSampleDateTime - PreviousSampleDateTime) AS FLOAT
) * 24.0 * 60.0 AS DeltaMinutes
FROM
(
SELECT
C.SampleDateTime AS CurrentSampleDateTime
,C.Location
,C.CurrentValue
,(
SELECT TOP 1
Previous.SampleDateTime
FROM Samples AS Previous
WHERE
Previous.Location = C.Location
AND Previous.SampleDateTime < C.SampleDateTime
ORDER BY Previous.SampleDateTime DESC
) AS PreviousSampleDateTime
FROM Samples AS C
) AS TempResults
Assuming all things being equal such as indexing, etc is this the most efficient way of achieving the above results? That is using a SubQuery to retrieve the last record?
Would I be better off creating a cursor that orders by Location, SampleDateTime and setting up variables for CurrentSampleDateTime and PreviousSampleDateTime...setting the Previous to the Current at the bottom of the while loop?
I'm not very good with CTE's is this something that could be accomplished more efficiently with a CTE? If so what would that look like?
I'm likely going to have to retrieve PreviousValue along with Previous SampleDateTime in order to get an average of the two. Does that change the results any.
Long story short what is the best/most efficient way of holding onto the values of a previous record if you need to use those values in calculations on the current record?
----UPDATE I should note that I have a clustered index on Location, SampleDateTime, CurrentValue so maybe that is what is affecting the results more than anything.
with 5,591,571 records my query (the one above) on average takes 3 mins and 20 seconds
The CTE that Joachim Isaksson below on average is taking 5 mins and 15 secs.
Maybe it's taking longer because it's not using the clustered index but is using the rownumber for the joins?
I started testing the cursor method but it's already at 10 minutes...so no go on that one.
I'll give it a day or so but think I will accept the CTE answer provided by Joachim Isaksson just because I found a new method of getting the last row.
Can anyone concur that it's the index on Location, SampleDateTime, CurrentValue that is making the subquery method faster?
I don't have SQL Server 2012 so can't test the LEAD/LAG method. I'd bet that would be quicker than anything I've tried assuming Microsoft implemented that efficiently. Probably just have to swap a pointer to a memory reference at the end of each row.
Upvotes: 0
Views: 648
Reputation: 469
If you are using SQL Server 2012, you can use the LAG window function that retrieves the value of the specified column from the previous row. It returns null if there is no previous row.
SELECT
a.*,
CAST((a.SampleDateTime - LAG(a.SampleDateTime) OVER(PARTITION BY a.location ORDER BY a.SampleDateTime ASC)) AS FLOAT)
* 24.0 * 60.0 AS DeltaMinutes
FROM samples a
ORDER BY
a.location,
a.SampleDateTime
You'd have to run some tests to see if it's faster. If you're not using SQL Server 2012 then at least this may give others an idea of how it can be done with 2012. I like @Joachim Isaksson 's answer using a CTE with a Row_Number()/Partition By for 2008 and 2005.
Have you considered creating a temp table to use instead of a CTE or subquery? You can create indexes on the temp table that are more suited for the join on RowNumber.
CREATE TABLE #tmp (
RowNumber INT,
Location INT,
SampleDateTime DATETIME,
CurrentValue INT)
;
INSERT INTO #tmp
SELECT
ROW_NUMBER() OVER (PARTITION BY Location
ORDER BY SampleDateTime DESC) rn,
Location,
SampleDateTime,
CurrentValue
FROM Samples
;
CREATE INDEX idx_location_row ON #tmp(Location,RowNumber) INCLUDE (SampleDateTime,CurrentValue);
SELECT
a.Location,
a.SampleDateTime,
a.CurrentValue,
CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT) * 24.0 * 60.0 AS DeltaMinutes
FROM #tmp a
LEFT JOIN #tmp b ON
a.Location = b.Location
AND b.RowNumber = a.RowNumber +1
ORDER BY
a.Location,
a.SampleDateTime
Upvotes: 1
Reputation: 180917
As always, testing with your real data is king.
Here's a CTE version that shows the samples for each location with time deltas from the previous sample. It uses OVER
ranking, which usually does well in comparison to subqueries for solving the same problem.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Location
ORDER BY SampleDateTime DESC) rn
FROM Samples
)
SELECT a.*,CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT)
* 24.0 * 60.0 AS DeltaMinutes
FROM cte a
LEFT JOIN cte b ON a.Location = b.Location AND b.rn = a.rn +1
Upvotes: 1