JCR
JCR

Reputation: 31

SQL Conditional update column value based on previous row value

I have a table with attendance dates in SQL Workbench/J. I need to define the attendance periods of the people, where any attendance period with gaps less or equal than 90 days are merged into a single attendance period and any gaps larger than that are considered a different attendance period. For example for a single person this is the table I have

id year start_date end_date prev_att_month diff
1 2012 2012-08-01 2012-08-31 2012-07-01 31
1 2012 2012-07-01 2012-07-31 2012-04-01 91
1 2012 2012-04-01 2012-04-30 2012-03-01 31
1 2012 2012-03-01 2012-03-31 2012-02-01 29
1 2012 2012-02-01 2012-02-29 2012-01-01 31
1 2012 2012-01-01 2012-01-31 2011-12-01 31
1 2011 2011-12-01 2011-12-31 2011-11-01 30
1 2011 2011-11-01 2011-11-30 2011-10-01 31
1 2011 2011-10-01 2011-10-31 2011-09-01 30
1 2011 2011-09-01 2011-09-30 2011-08-01 31
1 2011 2011-08-01 2011-08-31 2011-07-01 31
1 2011 2011-07-01 2011-07-31 2011-05-01 61
1 2011 2011-05-01 2011-05-31 2011-04-01 30
1 2011 2011-04-01 2011-04-30 2011-03-01 31
1 2011 2011-03-01 2011-03-31 2011-02-01 28
1 2011 2011-02-01 2011-02-28 2010-08-01 184
1 2010 2010-08-01 2010-08-31 2010-07-01 31
1 2010 2010-07-01 2010-07-31 2010-06-01 30
1 2010 2010-06-01 2010-06-30 2010-05-01 31
1 2010 2010-05-01 2010-05-31 2010-04-01 30
1 2010 2010-04-01 2010-04-30

where I defined the previous attendance month column with a lag function and then found the difference between that column and the the start_date in the diff column. This way I can check the gaps between the attendance months

I want this output of the attendance periods with the 90 day rule explained above:

id start_date end_date
1 1/04/2010 31/08/2010
1 1/02/2011 30/04/2012
1 1/07/2012 31/08/2012

Does any one have an idea of how to do this?

So far I was just able to define the difference between the attendance months but since this is a large data set I have not been able to find a solution to define the attendance periods without making a row to row analysis.

with [table] as (
    select id, year, start_date, end_date, 
        lag(start_date) over (partition by id order by id, year, start_date) as prev_att_month, 
        start_date-prev_att_month as diff
    from source
) 
select *
from [table]
where id = 1

Upvotes: 3

Views: 89

Answers (1)

Thom A
Thom A

Reputation: 95588

One method would be to use a windowed COUNT to count how many times a value greater than 90 has appeared in the diff column, which provides a unique group number. Then you can just group your data into those groups and get the MIN and MAX values:

WITH Grps AS(
    SELECT V.id,
           V.year,
           V.start_date,
           V.end_date,
           V.prev_att_month,
           V.diff,
           COUNT(CASE WHEN diff > 90 THEN 1 END) OVER (PARTITION BY ID ORDER BY V.start_date ASC) AS Grp
    FROM (VALUES(1,2012,CONVERT(date,'20120801'),CONVERT(date,'20120831'),CONVERT(date,'2012-07-01'),31),
                (1,2012,CONVERT(date,'20120701'),CONVERT(date,'20120731'),CONVERT(date,'2012-04-01'),91),
                (1,2012,CONVERT(date,'20120401'),CONVERT(date,'20120430'),CONVERT(date,'2012-03-01'),31),
                (1,2012,CONVERT(date,'20120301'),CONVERT(date,'20120331'),CONVERT(date,'2012-02-01'),29),
                (1,2012,CONVERT(date,'20120201'),CONVERT(date,'20120229'),CONVERT(date,'2012-01-01'),31),
                (1,2012,CONVERT(date,'20120101'),CONVERT(date,'20120131'),CONVERT(date,'2011-12-01'),31),
                (1,2011,CONVERT(date,'20111201'),CONVERT(date,'20111231'),CONVERT(date,'2011-11-01'),30),
                (1,2011,CONVERT(date,'20111101'),CONVERT(date,'20111130'),CONVERT(date,'2011-10-01'),31),
                (1,2011,CONVERT(date,'20111001'),CONVERT(date,'20111031'),CONVERT(date,'2011-09-01'),30),
                (1,2011,CONVERT(date,'20110901'),CONVERT(date,'20110930'),CONVERT(date,'2011-08-01'),31),
                (1,2011,CONVERT(date,'20110801'),CONVERT(date,'20110831'),CONVERT(date,'2011-07-01'),31),
                (1,2011,CONVERT(date,'20110701'),CONVERT(date,'20110731'),CONVERT(date,'2011-05-01'),61),
                (1,2011,CONVERT(date,'20110501'),CONVERT(date,'20110531'),CONVERT(date,'2011-04-01'),30),
                (1,2011,CONVERT(date,'20110401'),CONVERT(date,'20110430'),CONVERT(date,'2011-03-01'),31),
                (1,2011,CONVERT(date,'20110301'),CONVERT(date,'20110331'),CONVERT(date,'2011-02-01'),28),
                (1,2011,CONVERT(date,'20110201'),CONVERT(date,'20110228'),CONVERT(date,'2010-08-01'),184),
                (1,2010,CONVERT(date,'20100801'),CONVERT(date,'20100831'),CONVERT(date,'2010-07-01'),31),
                (1,2010,CONVERT(date,'20100701'),CONVERT(date,'20100731'),CONVERT(date,'2010-06-01'),30),
                (1,2010,CONVERT(date,'20100601'),CONVERT(date,'20100630'),CONVERT(date,'2010-05-01'),31),
                (1,2010,CONVERT(date,'20100501'),CONVERT(date,'20100531'),CONVERT(date,'2010-04-01'),30),
                (1,2010,CONVERT(date,'20100401'),CONVERT(date,'20100430'),NULL,NULL))V(id,year,start_date,end_date,prev_att_month,diff))
SELECT id,
       MIN(Start_date) AS Start_date,
       MAX(End_Date) AS End_Date
FROM Grps
GROUP BY Id,
         Grp
ORDER BY id,
         Start_date;

Upvotes: 3

Related Questions