m.falconelli
m.falconelli

Reputation: 61

Temporal Foreign Key in Sql Database

Which is the best practice to link tables which have unique relation for a determined period of time?

Example 1

Tab_Civil_Status:

ID NAME ID_STATUS
1 Jenny 1
2 Mike 2

Tab_Civil_Status_Desc:

ID_STATUS STATUS_DESC
1 Single
2 Married

Tab_Civil_Events:

PERSON-ID EVENT REF_Date new-Status
2 Wedding 10/5/2017 2

A query to Tab_Civil_Status should provide the below:

Case A:

Select * FROM `Tab_Civil_Status` […] Where REF_Date = 1/1/2020
ID NAME ID_STATUS
1 Jenny 1
2 Mike 2

Case B:

Select * FROM `Tab_Civil_Status` […] Where REF_Date = 1/1/2016
ID NAME ID_STATUS
1 Jenny 1
2 Mike 1

Example 2

Tab_Orders:

ID Order_Code Product_Id
1 a 1
2 b 2

Tab_Products:

Product_Id Product_Name
1 Apple
2 Meta

Tab_Products_Events:

Product_Id EVENT REF_Date Old_Name New_Name
2 Change_Name 1/12/2020 Facebook Meta

These are simplified sample, what I am looking for is the best practice to structure tables and queries for similar cases; the logic to apply rather than the actual code to use.

a - Is the best practice to have an "Event Table" as in the above cases tracking the changes? or there are better ways?

I know there might be sevaral ways to reach the same scope, another sample is the below

Tab_Products_Events_Sample_2:

Product_Id EVENT REF_Date Name
2 Given _Name 1/12/2010 Facebook
2 Change_Name 1/12/2019 Meta

b - is there a more efficient query than the below? Also based on the above point

@Selected_date = '1/1/2019'

SELECT 
    o.[Order_Code], 
    ifnull(e.[Old_Name],p.[Product_Name]) as p_name
FROM Tab_Orders as o
LEFT JOIN Tab_Products as p
ON o.[Product_Id]  = p.[Product_Id]
CROSS APPLY (
   SELECT top 1 t.[Old_Name] 
   FROM Tab_Products_Events as t
   WHERE o.[Product_Id]  = t.[Product_Id]
        and t.[REF_Date] >= @Selected_date
   ORDER BY t.[REF_Date] desc
) e

result

Order_Code p_name
a Apple
b Facebook

Thanks

Upvotes: 1

Views: 902

Answers (1)

Hadi
Hadi

Reputation: 37313

Note: my answer is based on the first example

1st approach: generating date ranges

You should first extract the date ranges for each person status using a similar query:

SELECT 
    PERSON_ID, new_Status, REF_Date, 
    ISNULL(LEAD(REF_Date) OVER(PARTITION BY PERSON_ID ORDER BY REF_DATE), GETDATE()) End_DATE
FROM
    Tab_Civil_Events

As an example, consider the following sample data:

CREATE TABLE Tab_Civil_Events
(
    PERSON_ID INT,  
    EVENT VARCHAR(50),  
    REF_Date DATETIME,  
    new_Status INT
)

INSERT INTO Tab_Civil_Events(PERSON_ID, EVENT, REF_Date, new_Status)
VALUES (2, 'Wedding', '10/5/2017', 2),
       (2, 'Divorce', '10/5/2018', 1),
       (2, 'Wedding', '10/5/2019', 2),
       (2, 'Divorce', '10/5/2020', 1)

SELECT 
    PERSON_ID, new_Status, REF_Date, 
    ISNULL(LEAD(REF_Date) OVER (PARTITION BY PERSON_ID ORDER BY REF_DATE), 
    GETDATE()) End_DATE
FROM
    Tab_Civil_Events

This will result in the following table:

PERSON_ID new_Status REF_Date End_DATE
2 2 2017-10-05T00:00:00Z 2018-10-05T00:00:00Z
2 1 2018-10-05T00:00:00Z 2019-10-05T00:00:00Z
2 2 2019-10-05T00:00:00Z 2020-10-05T00:00:00Z
2 1 2020-10-05T00:00:00Z 2022-01-29T09:47:52.7Z

The second step is to create a common table expression to query the data. For example:

DECLARE @QueryDate DATETIME  = '20190501'

;WITH CTE_1 AS 
(
    SELECT 
        PERSON_ID, 
        new_Status, 
        REF_Date, 
        ISNULL(LEAD(REF_Date) OVER (PARTITION BY PERSON_ID ORDER BY REF_DATE), 
        GETDATE()) End_DATE
    FROM 
        Tab_Civil_Events
)
SELECT 
    CTE_1.PERSON_ID, 
    dsc.STATUS 
FROM
    CTE_1 
INNER JOIN 
    Tab_Civil_Status_Desc dsc ON CTE_1.new_Status = dsc.ID 
WHERE 
    @QueryDate BETWEEN REF_Date AND End_DATE

Using the sample data I provided previously, this query will give the following result:

PERSON_ID STATUS
2 Single

While changing the date value to 20191101

SET @QueryDate = '20191101'

Will result in the following values:

PERSON_ID STATUS
2 Married

Make sure that you have created the appropriate indexes to support your queries.

For example:

CREATE NONCLUSTERED  INDEX IX_PersonId_RefDate 
ON Tab_Civil_Events(PERSON_ID, REF_DATE) 
INCLUDE(new_status);

2nd approach: retrieving the nearest event date

You should first retrieve the maximum event dates for each person having a value smallest than the given date:

SELECT 
    PERSON_ID,
    MAX(Ref_Date) AS event_date 
FROM 
    Tab_Civil_Events
WHERE 
    Ref_Date <= @QueryDate
GROUP BY 
    PERSON_ID

Next, you should join this query with the events table to get the status information as follows:

DECLARE @QueryDate DATETIME  = '20190501'

SELECT 
    tbl.PERSON_ID, dsc.STATUS
FROM
    Tab_Civil_Events tbl 
INNER JOIN 
    Tab_Civil_Status_Desc dsc ON tbl.new_Status = dsc.ID 
INNER JOIN
    (SELECT 
         PERSON_ID,
         MAX(Ref_Date) AS event_date 
     FROM 
         Tab_Civil_Events
     WHERE 
         Ref_Date <= @QueryDate
     GROUP BY 
         PERSON_ID) t1 ON tbl.PERSON_ID = t1.PERSON_ID 
                       AND t1.event_date = tbl.ref_date

3rd approach: temporal tables

I am posting this part after you mentioned that you can change the way the data is stored.

Using SQL Server 2016 or a newer version, you can use temporal tables to simplify querying your data.

As mentioned in the official documentation:

SQL Server 2016 introduced support for temporal tables (also known as system-versioned temporal tables) as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

Temporal tables are implemented as a pair of tables, a current table and a history table. Within each of these tables, start and end dates are stored and used to define the period of validity for each row.

You can refer to the following resources to learn more about this type of data structure:

Upvotes: 1

Related Questions