ClassyBear
ClassyBear

Reputation: 223

T-SQL LAG function default value

Why does the default parameter for the LAG function only apply to the first column of the returning results if there are no records in the joining table and you are only referencing the joining table fields?

To explain this better I have created the following scenario.

Schema

  CREATE TABLE Blogs(
    Id int IDENTITY(1,1) CONSTRAINT PK_Blogs_Id PRIMARY KEY,
    Title NVARCHAR(1000)
  )
  
  CREATE TABLE Comments(
    Id int IDENTITY(1,1) CONSTRAINT PK_Comments_Id PRIMARY KEY,
    BlogId INT NOT NULL,
    CommentText NVARCHAR(max)
  )
  
  INSERT INTO Blogs (Title) VALUES ('Blog 1')
  INSERT INTO Blogs (Title) VALUES ('Blog 2')
  INSERT INTO Blogs (Title) VALUES ('Blog 3')
  INSERT INTO Blogs (Title) VALUES ('Blog 4')
  INSERT INTO Blogs (Title) VALUES ('Blog 5')
  
  INSERT INTO Comments (BlogId, CommentText) VALUES (4,'Some text')
  INSERT INTO Comments (BlogId, CommentText) VALUES (4,'Some text 2')

Query

SELECT *, 
  LAG(CommentText,1,'No comment') OVER (Partition by Comments.BlogId ORDER BY Comments.Id Desc) LastComment
FROM Blogs LEFT JOIN Comments on Blogs.Id = Comments.BlogId;

In the query above it will return results with 'No comment' for LastComment on the first row and where the row has other comments, the rest will be null.

I know it works correctly (all rows that are null returns 'No comment' in the LastComment field) if you reference the key of Blogs in the window function (query below) but I am trying to understand why if the join returns a null, why doesn't it apply the default parameter in the LastComment LAG function.

SELECT *, 
  LAG(CommentText,1,'No comment') OVER (Partition by Blogs.Id ORDER BY Comments.Id Desc) LastComment
FROM Blogs LEFT JOIN Comments on Blogs.Id = Comments.BlogId;

Here is a SQL fiddle of the scenario http://sqlfiddle.com/#!18/eb850/9

Edit: To clarify; the question is why in the SQL fiddle scenario does Blog 1 return the default value 'No comment' where the rest of the Blogs that also do not have any comments return NULL. Why is the first row getting the default value assigned and the rest are not shouldn't Blog 1 also return NULL as there are no records for the LAG to read?

Upvotes: 3

Views: 1537

Answers (1)

Metarineo
Metarineo

Reputation: 21

In SQL Server, the LAG() function is used to access a preceding row within a result set, based on an ordering column. It has an optional default parameter, which specifies the value to be returned if there is no preceding row.

If you are using the LAG() function to access columns from a joining table, and there are no records in the joining table that match the current row in the first table, the default parameter will only apply to the first column of the returning results. This is because the LAG() function is designed to return a single value for each row in the result set, and it is not possible to return multiple values for a single row if there are no matching rows in the joining table.

For example, consider the following query:

SELECT t1.id, t1.data, t2.data
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.id

If there are no matching rows in table2 for a particular row in table1, the t2.data column will be NULL for that row. If you wanted to use the LAG() function to access the preceding value of t2.data, you could use the following query:

SELECT t1.id, t1.data, t2.data, LAG(t2.data) OVER (ORDER BY t1.id) AS prev_data
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.id

In this case, the LAG() function will return the preceding value of t2.data for each row in the result set, based on the ordering specified in the OVER() clause. If there is no preceding row, the default value specified in the LAG() function will be used. This default value will only apply to the prev_data column, since it is the only column being accessed using the LAG() function. The other columns in the result set (t1.id, t1.data, and t2.data) will not be affected by the default value.

Upvotes: 0

Related Questions