Reputation: 185
I am trying to build a dimension table which will identify current and historical data, by using the fields 'IsActive' and 'EffectiveDate'.
IsActive of 1 = Active.
EffectiveDate = Date the record was ingested.
Scenario: I have an existing record with the current view of the employee, however if any information about the employee changes, instead of updating the existing record, I would like to create a new line item with the updated record becoming active and the previous record becoming inactive.
ID | Name | Surname | Age | IsActive | EffectiveDate |
---|---|---|---|---|---|
1 | John | Doe | 54 | 1 | 2021-01-01 |
When a change to the employee data is done, I would like to update the table as follows:
ID | Name | Surname | Age | IsActive | EffectiveDate |
---|---|---|---|---|---|
1 | John | Doe | 54 | 0 | 2021-01-01 |
2 | John | Doe | 64 | 1 | 2021-06-25 |
I am using the combination of 'Name', 'Surname', and 'Age' to identify unique records. The reason I am using these three fields is because there is no other information provided to uniquely identify an employee.
I would really appreciate any assistance.
Upvotes: 0
Views: 3809
Reputation:
You could do this yourself, but I would give Temporal Tables a try first, this is pretty much exactly what they were designed for. You just need to learn slightly different query syntax to find the rows that were "active" at a point in time or during a range.
In the meantime I can share a very simple example of how to do this yourself, but I agree with the comment that Name + Surname + Age is a terrible primary key, because all three of those could change for any given employee (and the age absolutely will change - this is why we usually store Birthdate instead). Why don't we pretend that we assign each employee, somewhere, a unique but meaningless ID (the kind of thing that would go on their access badge or the data in the HR database) just to keep the example simple.
Let's say your table is:
CREATE TABLE dbo.EmployeeHistoryStuff
(
ID bigint NOT NULL IDENTITY(1,1) PRIMARY KEY,
EmployeeID int,
Name nvarchar(50),
Surname nvarchar(50),
Age tinyint,
IsActive bit NOT NULL DEFAULT (1),
EffectiveDate datetime2(0) NOT NULL DEFAULT sysutcdatetime()
);
And we can add a few sample rows:
INSERT dbo.EmployeeHistoryStuff(EmployeeID, Name, Surname, Age)
VALUES(1, N'Aaron', N'Bertrand', 29),
(2, N'Teemu', N'Selanne', 31),
(3, N'Bobby', N'Orr', 62),
(4, N'Wayne', N'Gretzky', 55);
Now we can create a trigger that intercepts any updates any previous rows to IsActive = 1
and inserts a new row:
CREATE TRIGGER dbo.InsteadOfEmployeeHistoryStuff
ON dbo.EmployeeHistoryStuff
INSTEAD OF UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE @now datetime2(0) = sysutcdatetime();
UPDATE old SET IsActive = 0
FROM inserted AS i
INNER JOIN dbo.EmployeeHistoryStuff AS old
ON i.EmployeeID = old.EmployeeID
WHERE old.IsActive = 1;
INSERT dbo.EmployeeHistoryStuff(EmployeeID, Name, Surname, Age, EffectiveDate)
SELECT EmployeeID, Name, Surname, Age, DATEADD(SECOND, 1, @now)
FROM inserted;
END
GO
Now, if we perform an update because some employees had a birthday:
UPDATE dbo.EmployeeHistoryStuff SET Age += 1 WHERE EmployeeID IN (1,2);
ID EmployeeID Name Surname Age IsActive EffectiveDate 1 1 Aaron Bertrand 29 False 2021-10-22 13:37:24 2 2 Teemu Selanne 31 False 2021-10-22 13:37:24 3 3 Bobby Orr 62 True 2021-10-22 13:37:24 4 4 Wayne Gretzky 55 True 2021-10-22 13:37:24 5 1 Aaron Bertrand 30 True 2021-10-22 13:37:30 6 2 Teemu Selanne 32 True 2021-10-22 13:37:30
Then Wayne wants to get out of the limelight:
UPDATE dbo.EmployeeHistoryStuff SET Surname = N'Schmetzky' WHERE EmployeeID = 4;
ID EmployeeID Name Surname Age IsActive EffectiveDate 1 1 Aaron Bertrand 29 False 2021-10-22 13:37:24 2 2 Teemu Selanne 31 False 2021-10-22 13:37:24 3 3 Bobby Orr 62 True 2021-10-22 13:37:24 4 4 Wayne Gretzky 55 False 2021-10-22 13:37:24 5 1 Aaron Bertrand 30 True 2021-10-22 13:37:30 6 2 Teemu Selanne 32 True 2021-10-22 13:37:30 7 4 Wayne Schmetzky 55 True 2021-10-22 13:37:35
Then Wayne changes his mind:
UPDATE dbo.EmployeeHistoryStuff SET Surname = N'Gretzky' WHERE EmployeeID = 4;
ID EmployeeID Name Surname Age IsActive EffectiveDate 1 1 Aaron Bertrand 29 False 2021-10-22 13:37:24 2 2 Teemu Selanne 31 False 2021-10-22 13:37:24 3 3 Bobby Orr 62 True 2021-10-22 13:37:24 4 4 Wayne Gretzky 55 False 2021-10-22 13:37:24 5 1 Aaron Bertrand 30 True 2021-10-22 13:37:30 6 2 Teemu Selanne 32 True 2021-10-22 13:37:30 7 4 Wayne Schmetzky 55 False 2021-10-22 13:37:35 8 4 Wayne Gretzky 55 True 2021-10-22 13:37:40 9 4 Wayne Gretzky 55 True 2021-10-22 13:37:40
My timings are a bit off here, I'll come back and fix in a bit.
Upvotes: 1
Reputation: 5074
You can use the Merge statement as one of the options, to update records in Target table (dim) when matched with Source table (raw table) based on condition.
Here to update the existing record you can’t use ‘Age’ column to define uniqueness in your combination, as your new dataset has the change of data in ‘Age’ column.
Below are my repro details:
Code:
Create procedure sp_employee
AS
SET NOCOUNT ON;
BEGIN
--update Active flag of existing data
MERGE dim_employee AS Target
USING employee_data AS Source
ON Source.Name = Target.Name
and Source.Surname = Target.Surname
WHEN MATCHED THEN UPDATE SET
Target.IsActive = 0;
--Insert new data (assuming the table has only changed data set)
Insert into dim_employee
select Name, Surname, Age, IsActive, EffectiveDate from employee_data;
End
GO
If you have old records in the source table along with the new change set, you can use the below code to insert only the change dataset.
Insert into dim_employee
select Name, Surname, Age, IsActive, EffectiveDate from employee_data
except
select distinct a.Name, a.Surname, a.Age, a.IsActive, a.EffectiveDate from employee_data a
inner join dim_employee b on a.Name = b.Name and a.Surname = b.Surname and a.Age = b.Age;
Execute stored procedure
Exec sp_employee;
After executing stored procedure: Here IsActive column data for ‘John Doe’ got updated to 0 and a new record for ‘John Doe’ got inserted from employee_data table.
Upvotes: 0