myforums
myforums

Reputation: 301

T-SQL script to populate NULL values in the Audit table

I have a Microsoft SQL Server 2019 database. There is a table called Employee that has 3 columns: EmployeeId, Salary, Title. There is also another table called Audit that stores history of changes in Employee table. The Audit table has 5 columns: AuditId, EmployeeId, Salary, Title, Timestamp. The Salary & Title columns in the Audit table are assigned values only if they were changed in the Employee table otherwise they are set to NULL. Could you please help me to write SQL script that parses Audit table and populates NULL values from the previous rows in Audit table? The NULL audit values left unassigned after the 1st step can be set from the Employee table (as they were never modified).

enter image description here

enter image description here

Create statements for tables

CREATE TABLE [dbo].[Employee](
    [EmployeeId] [int] IDENTITY(1,1) NOT NULL,
    [Salary] [int] NULL,
    [Title] [nvarchar](50) NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[Audit](
    [Auditid] [int] IDENTITY(1,1) NOT NULL,
    [EmployeeId] [int] NOT NULL,
    [Salary] [int] NULL,
    [Title] [nvarchar](50) NULL,
    [Timestamp] [datetime] NOT NULL
) ON [PRIMARY]

Here is my own solution but I am not sure if it is correct. Will need more testing on larger dataset:

-- Step 1: Use a CTE to create a running total with the last non-NULL values
    WITH CTE AS (
        SELECT
            AuditId,
            EmployeeId,
            Salary,
            Title,
            Timestamp,
            -- Create a running total of the last non-NULL value using the MAX() window function
            MAX(Salary) OVER (PARTITION BY EmployeeId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS FilledSalary,
            MAX(Title) OVER (PARTITION BY EmployeeId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS FilledTitle
        FROM
            Audit
    )
    -- Step 2: Update the original table with the values from the CTE
    UPDATE Audit
    SET
        Salary = CTE.FilledSalary,
        Title = CTE.FilledTitle
    FROM
        Audit
    JOIN
        CTE ON Audit.AuditId = CTE.AuditId;

Upvotes: 0

Views: 70

Answers (1)

Der U
Der U

Reputation: 3335

Classic "fill-down" or "forward-fill". This question is a duplicate, I think. These are good answers to similar questions:

https://stackoverflow.com/a/47775051/2331665

https://stackoverflow.com/a/54889650/2331665

This solution here makes use of a cte, which creates a unique group id for each value and potential subsequent NULLs. The final query finds the Maximum of the values in each group. This would also work with Minimum, since all the NULLs in the group are ignored and there is only one non-null value.

DROP TABLE IF EXISTS #Employee
CREATE TABLE #Employee (
[EmployeeId] int
,Salary int
,Title nvarchar(100)
)
INSERT INTO #Employee(
EmployeeId
,Salary
,Title
)
VALUES
 (1, 10, 'Lvl1')
,(2,100, 'Lvl1')


DROP TABLE IF EXISTS #Audit
create table [#Audit] (
[Auditid] int
,[EmployeeId] int
,Salary int
,Title nvarchar(100)
,Timestamp datetime2(0)
)
INSERT INTO #Audit(
Auditid
,EmployeeId
,Salary
,Title
,Timestamp
)
VALUES
 (1, 1,   11,   NULL, '2024-06-01')
,(2, 2,  101,   NULL, '2024-06-02')
,(3, 1, NULL, 'Lvl2', '2024-06-03')
,(4, 2, NULL, 'Lvl2', '2024-06-04')


;WITH grps AS(
SELECT 
  *
  ,Salary_grp = SUM(IIF(Salary IS NULL, 0, 1))
                OVER( PARTITION BY EmployeeId 
                      ORDER BY Auditid 
                      ROWS UNBOUNDED PRECEDING )
  ,Title_grp  = SUM(IIF(Title  IS NULL, 0, 1))
                OVER( PARTITION BY EmployeeId 
                      ORDER BY Auditid 
                      ROWS UNBOUNDED PRECEDING )
FROM #Audit
)
SELECT
  Auditid
  ,EmployeeId
  ,Salary_fd = COALESCE(
                 MAX(o.Salary)
                 OVER( PARTITION BY EmployeeId, Salary_grp 
                       ORDER BY Auditid 
                       ROWS UNBOUNDED PRECEDING )
                 ,init_Salary.Salary )
  ,Title_fd  = COALESCE(
                 MAX(o.Title )
                 OVER( PARTITION BY EmployeeId, Title_grp 
                       ORDER BY Auditid 
                       ROWS UNBOUNDED PRECEDING )
                 ,init_Title.Title )
  ,Timestamp
FROM grps o
OUTER APPLY(SELECT 
            TOP 1
              i.Salary
            FROM #Employee i
            WHERE i.EmployeeId = o.EmployeeId
                  AND i.Salary IS NOT NULL )init_Salary
OUTER APPLY(SELECT 
            TOP 1
              i.Title
            FROM #Employee i
            WHERE i.EmployeeId = o.EmployeeId
                  AND i.Title IS NOT NULL )init_Title
ORDER BY
  o.Auditid

The following is a solution using a self join with APPLY. This will likely perform slower.

SELECT
  o.[Auditid] 
  ,o.[EmployeeId] 
  ,[Salary] = COALESCE(o.Salary, filldown_salary.Salary,  init_Salary.Salary)
  ,[Title]  = COALESCE(o.Title , filldown_title.Title  ,  init_Title.Title  )
  ,o.[Timestamp]
FROM [Audit] o
OUTER APPLY(SELECT
            TOP 1
              i.[Salary] 
            FROM [Audit] i
            WHERE i.[EmployeeId] = o.[EmployeeId]
                  AND i.Auditid < o.Auditid
                  AND i.[Salary] IS NOT NULL
            ORDER BY i.[Timestamp] DESC )filldown_salary    
OUTER APPLY(SELECT
            TOP 1
              i.[Title]
            FROM [Audit] i
            WHERE i.[EmployeeId] = o.[EmployeeId]
                  AND i.Auditid < o.Auditid
                  AND i.[Title] IS NOT NULL
            ORDER BY i.[Timestamp] DESC )filldown_title
OUTER APPLY(SELECT 
            TOP 1
              i.Salary
            FROM Employee i
            WHERE i.EmployeeId = o.EmployeeId
                  AND i.Salary IS NOT NULL )init_Salary
OUTER APPLY(SELECT 
            TOP 1
              i.Title
            FROM Employee i
            WHERE i.EmployeeId = o.EmployeeId
                  AND i.Title IS NOT NULL )init_Title

Upvotes: 0

Related Questions