Reputation: 31
I have a table with attendance dates in SQL Workbench/J. I need to define the attendance periods of the people, where any attendance period with gaps less or equal than 90 days are merged into a single attendance period and any gaps larger than that are considered a different attendance period. For example for a single person this is the table I have
id | year | start_date | end_date | prev_att_month | diff |
---|---|---|---|---|---|
1 | 2012 | 2012-08-01 | 2012-08-31 | 2012-07-01 | 31 |
1 | 2012 | 2012-07-01 | 2012-07-31 | 2012-04-01 | 91 |
1 | 2012 | 2012-04-01 | 2012-04-30 | 2012-03-01 | 31 |
1 | 2012 | 2012-03-01 | 2012-03-31 | 2012-02-01 | 29 |
1 | 2012 | 2012-02-01 | 2012-02-29 | 2012-01-01 | 31 |
1 | 2012 | 2012-01-01 | 2012-01-31 | 2011-12-01 | 31 |
1 | 2011 | 2011-12-01 | 2011-12-31 | 2011-11-01 | 30 |
1 | 2011 | 2011-11-01 | 2011-11-30 | 2011-10-01 | 31 |
1 | 2011 | 2011-10-01 | 2011-10-31 | 2011-09-01 | 30 |
1 | 2011 | 2011-09-01 | 2011-09-30 | 2011-08-01 | 31 |
1 | 2011 | 2011-08-01 | 2011-08-31 | 2011-07-01 | 31 |
1 | 2011 | 2011-07-01 | 2011-07-31 | 2011-05-01 | 61 |
1 | 2011 | 2011-05-01 | 2011-05-31 | 2011-04-01 | 30 |
1 | 2011 | 2011-04-01 | 2011-04-30 | 2011-03-01 | 31 |
1 | 2011 | 2011-03-01 | 2011-03-31 | 2011-02-01 | 28 |
1 | 2011 | 2011-02-01 | 2011-02-28 | 2010-08-01 | 184 |
1 | 2010 | 2010-08-01 | 2010-08-31 | 2010-07-01 | 31 |
1 | 2010 | 2010-07-01 | 2010-07-31 | 2010-06-01 | 30 |
1 | 2010 | 2010-06-01 | 2010-06-30 | 2010-05-01 | 31 |
1 | 2010 | 2010-05-01 | 2010-05-31 | 2010-04-01 | 30 |
1 | 2010 | 2010-04-01 | 2010-04-30 |
where I defined the previous attendance month column with a lag function and then found the difference between that column and the the start_date in the diff column. This way I can check the gaps between the attendance months
I want this output of the attendance periods with the 90 day rule explained above:
id | start_date | end_date |
---|---|---|
1 | 1/04/2010 | 31/08/2010 |
1 | 1/02/2011 | 30/04/2012 |
1 | 1/07/2012 | 31/08/2012 |
Does any one have an idea of how to do this?
So far I was just able to define the difference between the attendance months but since this is a large data set I have not been able to find a solution to define the attendance periods without making a row to row analysis.
with [table] as (
select id, year, start_date, end_date,
lag(start_date) over (partition by id order by id, year, start_date) as prev_att_month,
start_date-prev_att_month as diff
from source
)
select *
from [table]
where id = 1
Upvotes: 3
Views: 89
Reputation: 95588
One method would be to use a windowed COUNT
to count how many times a value greater than 90 has appeared in the diff
column, which provides a unique group number. Then you can just group your data into those groups and get the MIN
and MAX
values:
WITH Grps AS(
SELECT V.id,
V.year,
V.start_date,
V.end_date,
V.prev_att_month,
V.diff,
COUNT(CASE WHEN diff > 90 THEN 1 END) OVER (PARTITION BY ID ORDER BY V.start_date ASC) AS Grp
FROM (VALUES(1,2012,CONVERT(date,'20120801'),CONVERT(date,'20120831'),CONVERT(date,'2012-07-01'),31),
(1,2012,CONVERT(date,'20120701'),CONVERT(date,'20120731'),CONVERT(date,'2012-04-01'),91),
(1,2012,CONVERT(date,'20120401'),CONVERT(date,'20120430'),CONVERT(date,'2012-03-01'),31),
(1,2012,CONVERT(date,'20120301'),CONVERT(date,'20120331'),CONVERT(date,'2012-02-01'),29),
(1,2012,CONVERT(date,'20120201'),CONVERT(date,'20120229'),CONVERT(date,'2012-01-01'),31),
(1,2012,CONVERT(date,'20120101'),CONVERT(date,'20120131'),CONVERT(date,'2011-12-01'),31),
(1,2011,CONVERT(date,'20111201'),CONVERT(date,'20111231'),CONVERT(date,'2011-11-01'),30),
(1,2011,CONVERT(date,'20111101'),CONVERT(date,'20111130'),CONVERT(date,'2011-10-01'),31),
(1,2011,CONVERT(date,'20111001'),CONVERT(date,'20111031'),CONVERT(date,'2011-09-01'),30),
(1,2011,CONVERT(date,'20110901'),CONVERT(date,'20110930'),CONVERT(date,'2011-08-01'),31),
(1,2011,CONVERT(date,'20110801'),CONVERT(date,'20110831'),CONVERT(date,'2011-07-01'),31),
(1,2011,CONVERT(date,'20110701'),CONVERT(date,'20110731'),CONVERT(date,'2011-05-01'),61),
(1,2011,CONVERT(date,'20110501'),CONVERT(date,'20110531'),CONVERT(date,'2011-04-01'),30),
(1,2011,CONVERT(date,'20110401'),CONVERT(date,'20110430'),CONVERT(date,'2011-03-01'),31),
(1,2011,CONVERT(date,'20110301'),CONVERT(date,'20110331'),CONVERT(date,'2011-02-01'),28),
(1,2011,CONVERT(date,'20110201'),CONVERT(date,'20110228'),CONVERT(date,'2010-08-01'),184),
(1,2010,CONVERT(date,'20100801'),CONVERT(date,'20100831'),CONVERT(date,'2010-07-01'),31),
(1,2010,CONVERT(date,'20100701'),CONVERT(date,'20100731'),CONVERT(date,'2010-06-01'),30),
(1,2010,CONVERT(date,'20100601'),CONVERT(date,'20100630'),CONVERT(date,'2010-05-01'),31),
(1,2010,CONVERT(date,'20100501'),CONVERT(date,'20100531'),CONVERT(date,'2010-04-01'),30),
(1,2010,CONVERT(date,'20100401'),CONVERT(date,'20100430'),NULL,NULL))V(id,year,start_date,end_date,prev_att_month,diff))
SELECT id,
MIN(Start_date) AS Start_date,
MAX(End_Date) AS End_Date
FROM Grps
GROUP BY Id,
Grp
ORDER BY id,
Start_date;
Upvotes: 3