Reputation: 85
In the table below, how to insert rows with the first and last date of years between the START_DATE and END_DATE column?
EMPID | EMPNAME | START_DATE | END_DATE |
---|---|---|---|
1001 | Shivansh | 2015-09-01 | 2018-03-31 |
1004 | Mayank | 2019-04-01 | 2020-06-30 |
The output should look as follows:
EMPID | EMPNAME | START_DATE | END_DATE |
---|---|---|---|
1001 | Shivansh | 2015-09-01 | 2015-12-31 |
1001 | Shivansh | 2016-01-01 | 2016-12-31 |
1001 | Shivansh | 2017-01-01 | 2017-12-31 |
1001 | Shivansh | 2018-01-01 | 2018-03-31 |
1004 | Mayank | 2019-04-01 | 2019-12-31 |
1004 | Mayank | 2020-01-01 | 2020-06-30 |
This has to be implemented using loops as Azure Synapse Analytics doesn't support Recursive common table expressions
Upvotes: 1
Views: 4900
Reputation: 85
I found the workaround using a loop. The steps followed are as follows:
Create a temporary table to hold initial values.
CREATE TABLE #EMP
(
EMPID VARCHAR(10),
EMPNAME VARCHAR(10),
START_DATE DATE,
END_DATE DATE
);
Insert initial values.
INSERT INTO #EMP
SELECT '1001', 'Shivansh', '2015-09-01', '2018-03-31';
INSERT INTO #EMP
SELECT '1004', 'Mayank', '2019-04-01', '2020-06-30';
Create the required table.
CREATE TABLE #NEWEMP
(
EMPID VARCHAR(10),
EMPNAME VARCHAR(10),
START_DATE DATE,
END_DATE DATE
);
Insert the first year date (i.e. if the START_DATE for a tuple is 2015-09-01 insert 2015-09-01 for START_DATE and 2015-12-31 for the END_DATE).
INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, START_DATE, DATEFROMPARTS(YEAR(START_DATE), 12, 31) FROM #EMP;
Similarly, Insert the last year date.
INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, DATEFROMPARTS(YEAR(END_DATE), 1, 1), END_DATE FROM #EMP;
Run a while loop till the maximum value of the difference between START_DATE and END_DATE column.
DECLARE @counter INT = 1;
DECLARE @len INT = (SELECT MAX(DATEDIFF(YEAR, START_DATE, END_DATE)) FROM #EMP);
WHILE @counter < @len
BEGIN
INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, DATEFROMPARTS(YEAR(START_DATE) + @counter, 1, 1), DATEFROMPARTS(YEAR(START_DATE) + @counter, 12, 31) FROM #EMP
WHERE @counter < DATEDIFF(YEAR, START_DATE, END_DATE);
SET @counter += 1;
END
Query the output.
SELECT * FROM #NEWEMP ORDER BY START_DATE;
Here is the Query, written in Azure Synapse Analytics.
The required output is.
Upvotes: 0
Reputation: 14379
This approach uses a numbers table and a number of date functions which are available in Azure Synapse Analytics dedicated SQL pools, including DATEFROMPARTS
, DATEDIFF
and YEAR
.
NB This is not a recursive query. There is a loop used in the creation of the numbers table but this is done only once. Once the numbers table exists it can be used for similar scenarios, eg converting recursive CTEs to set-based approaches compatible with Azure Synapse Analytics.
DATEFROMPARTS
is used to construct the first day of the year in the calculated records. I then use DATEADD
to add one year, then take away one day, to get the last day of the year. DATEDIFF
with year is used to determine the gap in years between the two dates and therefore the number of records that need to be added. I then UNION
the original and calculated records for the full result.
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
GO
CREATE TABLE #tmp (
empId INT NOT NULL,
empName VARCHAR(50) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NULL
)
GO
-- Setup test data
INSERT INTO #tmp ( empId, empName, start_date, end_date )
SELECT 1001, 'Shivansh', '2015-09-01', '2018-03-31'
UNION ALL
SELECT 1004, 'Mayank', '2019-04-01', '2020-06-30'
GO
;WITH cte AS (
SELECT *,
DATEFROMPARTS( YEAR(start_date) + n.number, 1, 1 ) newStart
FROM #tmp t
CROSS JOIN dbo.numbers n
WHERE n.number <= DATEDIFF( year, start_date, end_date )
)
SELECT 'o' s, empId, empName, start_date,
CASE
WHEN YEAR(start_date) = YEAR(end_date) THEN end_date
ELSE DATEFROMPARTS( YEAR(start_date), 12, 13 )
END end_date
FROM #tmp
UNION ALL
SELECT 'c', empId, empName,
newStart AS start_date,
CASE
WHEN YEAR(end_date) = YEAR(newStart) THEN end_date
ELSE DATEADD( day, -1, DATEADD( year, 1, newStart ) )
END newEnd
FROM cte
ORDER BY empId, start_date
My results:
I've added the o and c to indicate original and calculated rows but you can remove that column if you like. If you do not have a numbers table already then the script I used to create this one is here. This code has been tested on an Azure Synapse Analytics dedicated SQL pool, version Microsoft Azure SQL Data Warehouse - 10.0.15554.0 Dec 10 2020 03:11:10.
Upvotes: 1