Reputation: 1
I need to create a date range based on a date from a monthly file with the results grouped by EmpID and StatusCode.
The "start" date would be the beginnning of the earliest month and "end" would be the last day of the month for the last month in the grouping. Thank you!
Table:
EmpID StatusCode AsOF
12345 J 6/30/2014
12345 J 7/31/2014
12345 J 8/29/2014
12345 J 9/30/2014
12345 G 10/31/2014
12345 G 11/28/2014
12345 G 12/31/2014
12345 G 1/30/2015
12345 G 2/27/2015
12345 M 3/30/2015
12345 M 4/30/2015
12345 M 5/29/2015
12345 M 6/30/2015
12345 G 7/31/2015
12345 G 8/31/2015
12345 G 9/30/2015
12345 G 10/30/2015
12345 G 11/30/2015
Expected Result:
EmpID StatusCode Start End
12345 J 6/1/2014 9/30/2014
12345 G 10/1/2014 2/28/2015
12345 M 3/1/2015 6/30/2015
12345 G 7/1/2015 11/30/2015
Upvotes: 0
Views: 83
Reputation: 4797
The group by
logic is not only dependent on EmpID
and StatusCode
(as stated in the question) but also dependent on what is called gaps-and-islands. For example, the expected output has 2 records for EmpID
12345 and StatusCode
G because there are 2 islands (2014/10 - 2015/2 and 2015/7 - 2015/11) and a gap between them being (2015/3 - 2015/6).
Sample Data:
I used the sample data that @RazvanSocol was so nice to type up, but including it here just in case his answer gets modified later.
CREATE TABLE #sample_Data (
EmpID int NOT NULL,
StatusCode CHAR(1) NOT NULL,
AsOf DATE NOT NULL,
UNIQUE (EmpID, AsOf)
)
INSERT INTO #sample_Data (EmpID, StatusCode, AsOf) VALUES
(12345,'J','20140630'),
(12345,'J','20140731'),
(12345,'J','20140829'),
(12345,'J','20140930'),
(12345,'G','20141031'),
(12345,'G','20141128'),
(12345,'G','20141231'),
(12345,'G','20150130'),
(12345,'G','20150227'),
(12345,'M','20150330'),
(12345,'M','20150430'),
(12345,'M','20150529'),
(12345,'M','20150630'),
(12345,'G','20150731'),
(12345,'G','20150831'),
(12345,'G','20150930'),
(12345,'G','20151030'),
(12345,'G','20151130')
Answer:
This answer will work assuming that you are on SQL Server 2012 or later. The query below takes advantage of Window Functions such as lag
and sum
to identify where the islands begin and assign an IslandNbr
to them. On the final outer query, there is a datediff
calculation to determine the first day of that month, and a eomonth
function to determine the last day of the month for their corresponding input AsOf
dates.
select b.EmpID
, b.StatusCode
, cast(dateadd(month, datediff(month, 0, min(b.AsOf)), 0)as date) as [Start]
, eomonth(max(b.AsOf)) as [End]
from (
select a.EmpID
, a.StatusCode
, a.AsOf
, sum(a.IslandBegin) over (partition by a.EmpID, a.StatusCode order by a.AsOf) as IslandNbr
from (
select d.EmpID
, d.StatusCode
, d.AsOf
, case when datediff(month, lag(d.AsOf, 1, null) over (partition by d.EmpID, d.StatusCode order by d.AsOf asc), d.AsOf) = 1 then 0 else 1 end as IslandBegin
from #sample_Data as d
) as a
) as b
group by b.EmpID
, b.StatusCode
, b.IslandNbr
order by 3
Output:
The output matches the expected results exactly.
+-------+------------+------------+------------+
| EmpID | StatusCode | Start | End |
+-------+------------+------------+------------+
| 12345 | J | 2014-06-01 | 2014-09-30 |
| 12345 | G | 2014-10-01 | 2015-02-28 |
| 12345 | M | 2015-03-01 | 2015-06-30 |
| 12345 | G | 2015-07-01 | 2015-11-30 |
+-------+------------+------------+------------+
Upvotes: 2
Reputation: 5694
You can use something like this:
/*
CREATE TABLE Table1 (
EmpID int NOT NULL,
StatusCode CHAR(1) NOT NULL,
AsOf DATE NOT NULL,
UNIQUE (EmpID, AsOf)
)
INSERT INTO dbo.Table1 (EmpID, StatusCode, AsOf) VALUES
(12345,'J','20140630'),
(12345,'J','20140731'),
(12345,'J','20140829'),
(12345,'J','20140930'),
(12345,'G','20141031'),
(12345,'G','20141128'),
(12345,'G','20141231'),
(12345,'G','20150130'),
(12345,'G','20150227'),
(12345,'M','20150330'),
(12345,'M','20150430'),
(12345,'M','20150529'),
(12345,'M','20150630'),
(12345,'G','20150731'),
(12345,'G','20150831'),
(12345,'G','20150930'),
(12345,'G','20151030'),
(12345,'G','20151130')
*/
SELECT DISTINCT Q2.EmpID, Q2.StatusCode,
DATEADD(MONTH,DATEDIFF(MONTH,'20000101',Q2.FirstAsOf),'20000101') AS StartDate,
DATEADD(DAY,-1,DATEADD(MONTH,DATEDIFF(MONTH,'20000101',Q2.LastAsOf)+1,'20000101')) AS EndDate
FROM (
SELECT *,
MIN(Q1.AsOf) OVER (PARTITION BY Q1.EmpID,Q1.StatusCode,Q1.Dif) AS FirstAsOf,
MAX(Q1.AsOf) OVER (PARTITION BY Q1.EmpID,Q1.StatusCode,Q1.Dif) AS LastAsOf
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY t.EmpID ORDER BY t.AsOf)
-ROW_NUMBER() OVER (PARTITION BY t.EmpID,t.StatusCode ORDER BY t.AsOf) AS Dif
FROM dbo.Table1 t
) Q1
) Q2
ORDER BY EmpID, StartDate
Upvotes: 1