Using RANK() in SQL as ID Number for Groups of Records

This is my table:

employeeid  workdate                workstatus
----------- ----------------------- ----------
1           2020-09-01 00:00:00.000 ON 
1           2020-09-02 00:00:00.000 ON 
1           2020-09-03 00:00:00.000 ON 
1           2020-09-04 00:00:00.000 OFF
1           2020-09-05 00:00:00.000 OFF
2           2020-09-01 00:00:00.000 ON 
2           2020-09-02 00:00:00.000 ON 
2           2020-09-03 00:00:00.000 OFF
2           2020-09-04 00:00:00.000 OFF
2           2020-09-05 00:00:00.000 ON

And I am executing this query:

select employeeid, workdate, workstatus, rank() over(partition by employeeid, workstatus order by workdate) as cycle
from #workstatus
order by 1, 2

With this result:

employeeid  workdate                workstatus cycle
----------- ----------------------- ---------- --------------------
1           2020-09-01 00:00:00.000 ON         1
1           2020-09-02 00:00:00.000 ON         2
1           2020-09-03 00:00:00.000 ON         3
1           2020-09-04 00:00:00.000 OFF        1
1           2020-09-05 00:00:00.000 OFF        2
2           2020-09-01 00:00:00.000 ON         1
2           2020-09-02 00:00:00.000 ON         2
2           2020-09-03 00:00:00.000 OFF        1
2           2020-09-04 00:00:00.000 OFF        2
2           2020-09-05 00:00:00.000 ON         3

My goal is to have the "cycle" of on/off work be identified by a unique number per employee. So the three ON days for employee 1 would be cycle 1, then the two OFF days would be cycle 2.

The first two ON days for employee 2 would be cycle 1, then the two OFF days would be cycle 2, and the final ON day would be cycle 3.

I'm not sure if I can use RANK() for this, or if there is a better solution. Thanks!

Upvotes: 2

Answers (3)

GMB

Reputation: 222582

You can use window functions to solve this gaps-and-islands problem. One approach is to take the difference between row numbers to build groups of "adjacent" records:

select employeeid, workdate, workstatus,
    row_number() over(partition by employeeid, workstatus, rn1 - rn2 order by workdate) cycle
from (
    select t.*,
        row_number() over(partition by employeeid order by workdate) rn1,
        row_number() over(partition by employeeid, workstatus order by workdate) rn2
    from mytable t
) t

Upvotes: 0

Himanshu

Reputation: 3970

Use dense_rank instead of rank

Upvotes: 0

Gordon Linoff

Reputation: 1270391

This is a type of gaps-and-islands problem. For this version, use lag() and a cumulative sum:

select t.*,
       sum(case when prev_ws= workstatus then 0 else 1 end) over 
           (partition by employeeid order by workdate) as ranking
from (select t.*, 
             lag(workstatus) over (partition by employeeid order by workdate) as prev_ws
      from t
     ) t;

Upvotes: 1

Using RANK() in SQL as ID Number for Groups of Records

Answers (3)

Related Questions