Reputation: 301
I have a Microsoft SQL Server 2019 database. There is a table called Employee that has 3 columns: EmployeeId, Salary, Title. There is also another table called Audit that stores history of changes in Employee table. The Audit table has 5 columns: AuditId, EmployeeId, Salary, Title, Timestamp. The Salary & Title columns in the Audit table are assigned values only if they were changed in the Employee table otherwise they are set to NULL. Could you please help me to write SQL script that parses Audit table and populates NULL values from the previous rows in Audit table? The NULL audit values left unassigned after the 1st step can be set from the Employee table (as they were never modified).
Create statements for tables
CREATE TABLE [dbo].[Employee](
[EmployeeId] [int] IDENTITY(1,1) NOT NULL,
[Salary] [int] NULL,
[Title] [nvarchar](50) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Audit](
[Auditid] [int] IDENTITY(1,1) NOT NULL,
[EmployeeId] [int] NOT NULL,
[Salary] [int] NULL,
[Title] [nvarchar](50) NULL,
[Timestamp] [datetime] NOT NULL
) ON [PRIMARY]
Here is my own solution but I am not sure if it is correct. Will need more testing on larger dataset:
-- Step 1: Use a CTE to create a running total with the last non-NULL values
WITH CTE AS (
SELECT
AuditId,
EmployeeId,
Salary,
Title,
Timestamp,
-- Create a running total of the last non-NULL value using the MAX() window function
MAX(Salary) OVER (PARTITION BY EmployeeId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS FilledSalary,
MAX(Title) OVER (PARTITION BY EmployeeId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS FilledTitle
FROM
Audit
)
-- Step 2: Update the original table with the values from the CTE
UPDATE Audit
SET
Salary = CTE.FilledSalary,
Title = CTE.FilledTitle
FROM
Audit
JOIN
CTE ON Audit.AuditId = CTE.AuditId;
Upvotes: 0
Views: 70
Reputation: 3335
Classic "fill-down" or "forward-fill". This question is a duplicate, I think. These are good answers to similar questions:
https://stackoverflow.com/a/47775051/2331665
https://stackoverflow.com/a/54889650/2331665
This solution here makes use of a cte, which creates a unique group id for each value and potential subsequent NULLs. The final query finds the Maximum of the values in each group. This would also work with Minimum, since all the NULLs in the group are ignored and there is only one non-null value.
DROP TABLE IF EXISTS #Employee
CREATE TABLE #Employee (
[EmployeeId] int
,Salary int
,Title nvarchar(100)
)
INSERT INTO #Employee(
EmployeeId
,Salary
,Title
)
VALUES
(1, 10, 'Lvl1')
,(2,100, 'Lvl1')
DROP TABLE IF EXISTS #Audit
create table [#Audit] (
[Auditid] int
,[EmployeeId] int
,Salary int
,Title nvarchar(100)
,Timestamp datetime2(0)
)
INSERT INTO #Audit(
Auditid
,EmployeeId
,Salary
,Title
,Timestamp
)
VALUES
(1, 1, 11, NULL, '2024-06-01')
,(2, 2, 101, NULL, '2024-06-02')
,(3, 1, NULL, 'Lvl2', '2024-06-03')
,(4, 2, NULL, 'Lvl2', '2024-06-04')
;WITH grps AS(
SELECT
*
,Salary_grp = SUM(IIF(Salary IS NULL, 0, 1))
OVER( PARTITION BY EmployeeId
ORDER BY Auditid
ROWS UNBOUNDED PRECEDING )
,Title_grp = SUM(IIF(Title IS NULL, 0, 1))
OVER( PARTITION BY EmployeeId
ORDER BY Auditid
ROWS UNBOUNDED PRECEDING )
FROM #Audit
)
SELECT
Auditid
,EmployeeId
,Salary_fd = COALESCE(
MAX(o.Salary)
OVER( PARTITION BY EmployeeId, Salary_grp
ORDER BY Auditid
ROWS UNBOUNDED PRECEDING )
,init_Salary.Salary )
,Title_fd = COALESCE(
MAX(o.Title )
OVER( PARTITION BY EmployeeId, Title_grp
ORDER BY Auditid
ROWS UNBOUNDED PRECEDING )
,init_Title.Title )
,Timestamp
FROM grps o
OUTER APPLY(SELECT
TOP 1
i.Salary
FROM #Employee i
WHERE i.EmployeeId = o.EmployeeId
AND i.Salary IS NOT NULL )init_Salary
OUTER APPLY(SELECT
TOP 1
i.Title
FROM #Employee i
WHERE i.EmployeeId = o.EmployeeId
AND i.Title IS NOT NULL )init_Title
ORDER BY
o.Auditid
The following is a solution using a self join with APPLY. This will likely perform slower.
SELECT
o.[Auditid]
,o.[EmployeeId]
,[Salary] = COALESCE(o.Salary, filldown_salary.Salary, init_Salary.Salary)
,[Title] = COALESCE(o.Title , filldown_title.Title , init_Title.Title )
,o.[Timestamp]
FROM [Audit] o
OUTER APPLY(SELECT
TOP 1
i.[Salary]
FROM [Audit] i
WHERE i.[EmployeeId] = o.[EmployeeId]
AND i.Auditid < o.Auditid
AND i.[Salary] IS NOT NULL
ORDER BY i.[Timestamp] DESC )filldown_salary
OUTER APPLY(SELECT
TOP 1
i.[Title]
FROM [Audit] i
WHERE i.[EmployeeId] = o.[EmployeeId]
AND i.Auditid < o.Auditid
AND i.[Title] IS NOT NULL
ORDER BY i.[Timestamp] DESC )filldown_title
OUTER APPLY(SELECT
TOP 1
i.Salary
FROM Employee i
WHERE i.EmployeeId = o.EmployeeId
AND i.Salary IS NOT NULL )init_Salary
OUTER APPLY(SELECT
TOP 1
i.Title
FROM Employee i
WHERE i.EmployeeId = o.EmployeeId
AND i.Title IS NOT NULL )init_Title
Upvotes: 0