Abhishek Dutta
Abhishek Dutta

Reputation: 85

Recursive Query in Azure Synapse Analytics for Dates

In the table below, how to insert rows with the first and last date of years between the START_DATE and END_DATE column?

EMPID EMPNAME START_DATE END_DATE
1001 Shivansh 2015-09-01 2018-03-31
1004 Mayank 2019-04-01 2020-06-30

The output should look as follows:

EMPID EMPNAME START_DATE END_DATE
1001 Shivansh 2015-09-01 2015-12-31
1001 Shivansh 2016-01-01 2016-12-31
1001 Shivansh 2017-01-01 2017-12-31
1001 Shivansh 2018-01-01 2018-03-31
1004 Mayank 2019-04-01 2019-12-31
1004 Mayank 2020-01-01 2020-06-30

This has to be implemented using loops as Azure Synapse Analytics doesn't support Recursive common table expressions

Upvotes: 1

Views: 4900

Answers (2)

Abhishek Dutta
Abhishek Dutta

Reputation: 85

I found the workaround using a loop. The steps followed are as follows:

Create a temporary table to hold initial values.

CREATE TABLE #EMP
(
    EMPID VARCHAR(10),
    EMPNAME VARCHAR(10),
    START_DATE DATE,
    END_DATE DATE
);

Insert initial values.

INSERT INTO #EMP 
SELECT '1001', 'Shivansh', '2015-09-01', '2018-03-31';

INSERT INTO #EMP 
SELECT '1004', 'Mayank', '2019-04-01', '2020-06-30';

Create the required table.

CREATE TABLE #NEWEMP
(
    EMPID VARCHAR(10),
    EMPNAME VARCHAR(10),
    START_DATE DATE,
    END_DATE DATE
);

Insert the first year date (i.e. if the START_DATE for a tuple is 2015-09-01 insert 2015-09-01 for START_DATE and 2015-12-31 for the END_DATE).

INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, START_DATE, DATEFROMPARTS(YEAR(START_DATE), 12, 31) FROM #EMP;

Similarly, Insert the last year date.

INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, DATEFROMPARTS(YEAR(END_DATE), 1, 1), END_DATE FROM #EMP;

Run a while loop till the maximum value of the difference between START_DATE and END_DATE column.

DECLARE @counter INT = 1;
DECLARE @len INT = (SELECT MAX(DATEDIFF(YEAR, START_DATE, END_DATE)) FROM #EMP);

WHILE @counter < @len
BEGIN
    INSERT INTO #NEWEMP
    SELECT EMPID, EMPNAME, DATEFROMPARTS(YEAR(START_DATE) + @counter, 1, 1), DATEFROMPARTS(YEAR(START_DATE) + @counter, 12, 31) FROM #EMP
    WHERE @counter < DATEDIFF(YEAR, START_DATE, END_DATE);
    SET @counter += 1;
END

Query the output.

SELECT * FROM #NEWEMP ORDER BY START_DATE;

Here is the Query, written in Azure Synapse Analytics. enter image description here The required output is. enter image description here

Upvotes: 0

wBob
wBob

Reputation: 14379

This approach uses a numbers table and a number of date functions which are available in Azure Synapse Analytics dedicated SQL pools, including DATEFROMPARTS, DATEDIFF and YEAR.

NB This is not a recursive query. There is a loop used in the creation of the numbers table but this is done only once. Once the numbers table exists it can be used for similar scenarios, eg converting recursive CTEs to set-based approaches compatible with Azure Synapse Analytics.

DATEFROMPARTS is used to construct the first day of the year in the calculated records. I then use DATEADD to add one year, then take away one day, to get the last day of the year. DATEDIFF with year is used to determine the gap in years between the two dates and therefore the number of records that need to be added. I then UNION the original and calculated records for the full result.

IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
GO

CREATE TABLE #tmp ( 
    empId       INT NOT NULL, 
    empName     VARCHAR(50) NOT NULL,
    start_date  DATE NOT NULL, 
    end_date    DATE NULL
)
GO


-- Setup test data
INSERT INTO #tmp ( empId, empName, start_date, end_date )
SELECT 1001, 'Shivansh', '2015-09-01', '2018-03-31'
UNION ALL
SELECT 1004, 'Mayank', '2019-04-01', '2020-06-30'
GO

;WITH cte AS (
SELECT *,
    DATEFROMPARTS( YEAR(start_date) + n.number, 1, 1 ) newStart
FROM #tmp t
    CROSS JOIN dbo.numbers n
WHERE n.number <= DATEDIFF( year, start_date, end_date ) 
)
SELECT 'o' s, empId, empName, start_date, 
    CASE
        WHEN YEAR(start_date) = YEAR(end_date) THEN end_date
        ELSE DATEFROMPARTS( YEAR(start_date), 12, 13 )
    END end_date
FROM #tmp
UNION ALL
SELECT 'c', empId, empName, 
    newStart AS start_date,
    CASE 
        WHEN YEAR(end_date) = YEAR(newStart) THEN end_date
        ELSE DATEADD( day, -1, DATEADD( year, 1, newStart ) ) 
    END newEnd
FROM cte
ORDER BY empId, start_date

My results:

My results

I've added the o and c to indicate original and calculated rows but you can remove that column if you like. If you do not have a numbers table already then the script I used to create this one is here. This code has been tested on an Azure Synapse Analytics dedicated SQL pool, version Microsoft Azure SQL Data Warehouse - 10.0.15554.0 Dec 10 2020 03:11:10.

Upvotes: 1

Related Questions