Reputation: 4137
What are the advantages of using the new LAG and LEAD functions in SQL Server 2012?
Is it simply a matter of easier to write ans simpler to debug queries or is there also a performance improvement?
This is important for me, since we require this type of functionality very often and I need to know if we should recommend an upgrade in the near future.
If it's only easier queries, it wont be worth the hassle (and costs) of upgrading.
Upvotes: 4
Views: 7984
Reputation: 1598
To demonstrate a difference in the execution plan, I've used the winning solution from Dave's SQL Authority blog:
;WITH T1
AS (SELECT row_number() OVER (ORDER BY SalesOrderDetailID) N
, s.SalesOrderID
, s.SalesOrderDetailID
FROM
TempDB.dbo.LAG s
WHERE
SalesOrderID IN (20120303, 20120515, 20120824, 20121031))
SELECT SalesOrderID
, SalesOrderDetailID AS CurrentSalesOrderDetailID
/* , CASE
WHEN N % 2 = 1 THEN
max(CASE
WHEN N % 2 = 0 THEN
SalesOrderDetailID
END) OVER (PARTITION BY (N + 1) / 2)
ELSE
max(CASE
WHEN N % 2 = 1 THEN
SalesOrderDetailID
END) OVER (PARTITION BY N / 2)
END LeadVal */
, CASE
WHEN N % 2 = 1 THEN
max(CASE
WHEN N % 2 = 0 THEN
SalesOrderDetailID
END) OVER (PARTITION BY N / 2)
ELSE
max(CASE
WHEN N % 2 = 1 THEN
SalesOrderDetailID
END) OVER (PARTITION BY (N + 1) / 2)
END PreviousSalesOrderDetailID
FROM
T1
ORDER BY
SalesOrderID
, SalesOrderDetailID;
SELECT SalesOrderID
, SalesOrderDetailID AS CurrentSalesOrderDetailID
, LAG(SalesOrderDetailID, 1, 0) OVER (ORDER BY SalesOrderID, SalesOrderDetailID) AS PreviousSalesOrderDetailID
FROM TempDB.dbo.LAG
WHERE SalesOrderID IN (20120303, 20120515, 20120824, 20121031);
Warning: Null value is eliminated by an aggregate or other SET operation.
(10204 row(s) affected)
Table 'Worktable'. Scan count 6, logical reads 81638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'LAG'. Scan count 4, logical reads 48, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 297 ms, elapsed time = 332 ms.
--- versus ---
(10204 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'LAG'. Scan count 4, logical reads 48, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 113 ms.
Next to being way more elegant, it consumes much less resources.
Here's the comparison of graphical execution plans:
Execution plans show one clear winner in this specific case. Dave's page has many possible different ways to get LEAD/LAG functionality. Maybe some of them would beat the SQL Server's internal solution. Or, maybe not.
Upvotes: 5
Reputation: 122609
I can't comment much on MS SQL Server 2012, but from a PostgreSQL point of view, these functions have been available since version 8.4.
In general, they're very convenient to detect change (typically, in a time series, in conjunction with ORDER BY
). Typically:
WITH shifted_timeseries AS (
SELECT event_time,
value,
LAG(value) OVER (ORDER BY event_time) AS lagged_value
FROM timeseries
)
SELECT event_time AS change_time, value AS new_value
FROM shifted_timeseries
WHERE value != lagged_value;
For this sort of thing, in terms of clarity alone, they're worth it (although that's possibly subjective).
For more complex operations, for example, if you want time periods of consecutive values, this answer is a very solution elegant solution to this problem. It appears to work just fine in SQL Server 2012 according to this SQLFiddle.
These two blog entries also show a comparison between using LEAD/LAG and doing the same query without:
(It would be interesting to compare the execution plans.)
Upvotes: 3