Gidil
Gidil

Reputation: 4137

LAG and LEAD functions

What are the advantages of using the new LAG and LEAD functions in SQL Server 2012?
Is it simply a matter of easier to write ans simpler to debug queries or is there also a performance improvement?

This is important for me, since we require this type of functionality very often and I need to know if we should recommend an upgrade in the near future.
If it's only easier queries, it wont be worth the hassle (and costs) of upgrading.

Upvotes: 4

Views: 7984

Answers (2)

milivojeviCH
milivojeviCH

Reputation: 1598

To demonstrate a difference in the execution plan, I've used the winning solution from Dave's SQL Authority blog:

;WITH T1
AS (SELECT row_number() OVER (ORDER BY SalesOrderDetailID) N
         , s.SalesOrderID
         , s.SalesOrderDetailID
    FROM
        TempDB.dbo.LAG s
    WHERE
        SalesOrderID IN (20120303, 20120515, 20120824, 20121031))
SELECT SalesOrderID
     , SalesOrderDetailID AS CurrentSalesOrderDetailID
/*   , CASE
           WHEN N % 2 = 1 THEN
               max(CASE
                   WHEN N % 2 = 0 THEN
                       SalesOrderDetailID
               END) OVER (PARTITION BY (N + 1) / 2)
           ELSE
               max(CASE
                   WHEN N % 2 = 1 THEN
                       SalesOrderDetailID
               END) OVER (PARTITION BY N / 2)
       END LeadVal */
     , CASE
           WHEN N % 2 = 1 THEN
               max(CASE
                   WHEN N % 2 = 0 THEN
                       SalesOrderDetailID
               END) OVER (PARTITION BY N / 2)
           ELSE
               max(CASE
                   WHEN N % 2 = 1 THEN
                       SalesOrderDetailID
               END) OVER (PARTITION BY (N + 1) / 2)
       END PreviousSalesOrderDetailID
FROM
    T1
ORDER BY
    SalesOrderID
  , SalesOrderDetailID;



SELECT SalesOrderID
     , SalesOrderDetailID AS CurrentSalesOrderDetailID
     , LAG(SalesOrderDetailID, 1, 0) OVER (ORDER BY SalesOrderID, SalesOrderDetailID) AS PreviousSalesOrderDetailID
FROM TempDB.dbo.LAG
WHERE SalesOrderID  IN (20120303, 20120515, 20120824, 20121031);





Warning: Null value is eliminated by an aggregate or other SET operation.

(10204 row(s) affected)
Table 'Worktable'. Scan count 6, logical reads 81638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'LAG'. Scan count 4, logical reads 48, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:   CPU time = 297 ms,  elapsed time = 332 ms.

--- versus ---

(10204 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'LAG'. Scan count 4, logical reads 48, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:   CPU time = 78 ms,  elapsed time = 113 ms.

Next to being way more elegant, it consumes much less resources.

Here's the comparison of graphical execution plans:

Execution Plans Show Clear Winner in This Specific Case

Execution plans show one clear winner in this specific case. Dave's page has many possible different ways to get LEAD/LAG functionality. Maybe some of them would beat the SQL Server's internal solution. Or, maybe not.

Upvotes: 5

Bruno
Bruno

Reputation: 122609

I can't comment much on MS SQL Server 2012, but from a PostgreSQL point of view, these functions have been available since version 8.4.

In general, they're very convenient to detect change (typically, in a time series, in conjunction with ORDER BY). Typically:

WITH shifted_timeseries AS (
    SELECT event_time,
           value,
           LAG(value) OVER (ORDER BY event_time) AS lagged_value
        FROM timeseries
)
SELECT event_time AS change_time, value AS new_value
FROM shifted_timeseries
    WHERE value != lagged_value;

For this sort of thing, in terms of clarity alone, they're worth it (although that's possibly subjective).

For more complex operations, for example, if you want time periods of consecutive values, this answer is a very solution elegant solution to this problem. It appears to work just fine in SQL Server 2012 according to this SQLFiddle.

These two blog entries also show a comparison between using LEAD/LAG and doing the same query without:

(It would be interesting to compare the execution plans.)

Upvotes: 3

Related Questions