Lior I
Lior I

Reputation: 47

oracle group dates rows by continous range

I need to group and sum the rows of one day according to continuous worker date range.

Table attendance definition:

row_no NUMBER (*,0) NOT NULL,        -- row number - generated from a sequence  
worker_id NUMBER NOT NULL,           -- Attendance worker id  
date1 DATE DEFAULT SYSDATE NOT NULL, -- Attendance Date/time  
type1 NUMBER(3,0) NOT NULL,          -- Attendance type: 0-Enter, 1-Exit  

worker_id date1             type1  
2         13/06/2016-09:00  0  
3         13/06/2016-12:10  0  
2         13/06/2016-13:20  1  
2         13/06/2016-15:00  0  
2         13/06/2016-17:00  1  
3         13/06/2016-18:45  1  
2         13/06/2016-19:00  0  

Result if report is run at 22:00

worker_id date1      fr_hour to_hour hours  
2         13/06/2016 09:00   13:20   4:20  
2         13/06/2016 15:00   17:00   2:00  
2         13/06/2016 19:00   22:00   3:00  
3         13/06/2016 12:10   18:45   6:35  

Upvotes: 1

Views: 114

Answers (2)

Alex Poole
Alex Poole

Reputation: 191235

Taking continuous periods that start on an earlier day complicates it a bit. You can either calculate all the ranges etc. for all dates, back to the start of time - assuming you don't archive off old records and the very first entry for any worker in the data isn't a check-out - and then after doing all that work filter on the date you're interested in. Or you can look only at that date's data and see if a worker's records start with a check-in or check-out.

I've added records for a fourth worker:

 WORKER_ID DATE1                 TYPE1
---------- ---------------- ----------
         4 2016-06-12 19:00          0
         4 2016-06-13 03:00          1
         2 2016-06-13 09:00          0
         3 2016-06-13 12:10          0
         4 2016-06-13 13:00          0
         2 2016-06-13 13:20          1
         4 2016-06-13 14:30          1
         2 2016-06-13 15:00          0
         2 2016-06-13 17:00          1
         3 2016-06-13 18:45          1
         4 2016-06-13 19:00          0
         2 2016-06-13 19:00          0

You can use analytic functions to work out a row number for each entry, and also find the first type1 value for each worker that day; this also effectively pivots to get the time in and out as separate columns:

select worker_id, trunc(date1) as date1, type1,
  case when type1 = 0 then date1 end as time_in,
  case when type1 = 1 then date1 end as time_out,
  row_number() over (partition by worker_id, trunc(date1), type1 order by date1) as rn,
  min(type1) keep (dense_rank first order by date1) over (partition by worker_id, trunc(date1)) as open_start,
  max(type1) keep (dense_rank last order by date1) over (partition by worker_id, trunc(date1)) as open_end,
  row_number() over (partition by worker_id, trunc(date1), type1 order by date1)
    - case when type1 = 1 then min(type1) keep (dense_rank first order by date1)
        over (partition by worker_id, trunc(date1)) else 0 end as grp
from attendance
where date1 >= date '2016-06-13' and date1 < date '2016-06-14'
order by worker_id, attendance.date1;

 WORKER_ID DATE1                 TYPE1 TIME_IN          TIME_OUT                 RN OPEN_START   OPEN_END        GRP
---------- ---------------- ---------- ---------------- ---------------- ---------- ---------- ---------- ----------
         2 2016-06-13 00:00          0 2016-06-13 09:00                           1          0          0          1
         2 2016-06-13 00:00          1                  2016-06-13 13:20          1          0          0          1
         2 2016-06-13 00:00          0 2016-06-13 15:00                           2          0          0          2
         2 2016-06-13 00:00          1                  2016-06-13 17:00          2          0          0          2
         2 2016-06-13 00:00          0 2016-06-13 19:00                           3          0          0          3
         3 2016-06-13 00:00          0 2016-06-13 12:10                           1          0          1          1
         3 2016-06-13 00:00          1                  2016-06-13 18:45          1          0          1          1
         4 2016-06-13 00:00          1                  2016-06-13 03:00          1          1          0          0
         4 2016-06-13 00:00          0 2016-06-13 13:00                           1          1          0          1
         4 2016-06-13 00:00          1                  2016-06-13 14:30          2          1          0          1
         4 2016-06-13 00:00          0 2016-06-13 19:00                           2          1          0          2

The rn column is a raw (naive) attempt to group in and out records together, but for worker 4 that goes out of step. The open_start works out if the first record was a check-out. The value that gets - either zero or 1 - can then be subtracted from rn to get a more useful grouping flag, which I've called grp.

You can then use that as an inline view or CTE and aggregate the time in/out records for each group, adding nvl() or coalesce() to put in missing midnight-start or 10pm-end values:

select worker_id,
  date1 as date1,
  nvl(min(time_in), date1) as fr_hour,
  nvl(max(time_out), date1 + 22/24) as to_hour,
  date1 + (nvl(max(time_out), date1 + 22/24) - date1)
    - (nvl(min(time_in), date1) - date1) as hours
from (
  select worker_id,
    trunc(date1) as date1,
    case when type1 = 0 then date1 end as time_in,
    case when type1 = 1 then date1 end as time_out,
    row_number() over (partition by worker_id, trunc(date1), type1 order by date1)
      - case when type1 = 1 then min(type1) keep (dense_rank first order by date1)
          over (partition by worker_id, trunc(date1)) else 0 end as grp
  from attendance
  where date1 >= date '2016-06-13' and date1 < date '2016-06-14'
)
group by worker_id, date1, grp
order by worker_id, date1, grp;

 WORKER_ID DATE1            FR_HOUR          TO_HOUR          HOURS          
---------- ---------------- ---------------- ---------------- ----------------
         2 2016-06-13 00:00 2016-06-13 09:00 2016-06-13 13:20 2016-06-13 04:20
         2 2016-06-13 00:00 2016-06-13 15:00 2016-06-13 17:00 2016-06-13 02:00
         2 2016-06-13 00:00 2016-06-13 19:00 2016-06-13 22:00 2016-06-13 03:00
         3 2016-06-13 00:00 2016-06-13 12:10 2016-06-13 18:45 2016-06-13 06:35
         4 2016-06-13 00:00 2016-06-13 00:00 2016-06-13 03:00 2016-06-13 03:00
         4 2016-06-13 00:00 2016-06-13 13:00 2016-06-13 14:30 2016-06-13 01:30
         4 2016-06-13 00:00 2016-06-13 19:00 2016-06-13 22:00 2016-06-13 03:00

The 'hours' value manipulates the date and the time-in/out values to come up with what looks like another time; but it's actually the elapsed time.

Finally you can format the columns to remove the bits you aren't interested in:

select worker_id,
  to_char(date1, 'DD/MM/YYYY') as date1,
  to_char(nvl(min(time_in), date1), 'HH24:MI') as fr_hour,
  to_char(nvl(max(time_out), date1 + 22/24), 'HH24:MI') as to_hour,
  to_char(date1 + (nvl(max(time_out),date1 + 22/24) - date1)
    - (nvl(min(time_in), date1) - date1), 'HH24:MI') as hours
from (
  select worker_id,
    trunc(date1) as date1,
    case when type1 = 0 then date1 end as time_in,
    case when type1 = 1 then date1 end as time_out,
    row_number() over (partition by worker_id, trunc(date1), type1 order by date1)
      - case when type1 = 1 then min(type1) keep (dense_rank first order by date1)
          over (partition by worker_id, trunc(date1)) else 0 end as grp
  from attendance
  where date1 >= date '2016-06-13' and date1 < date '2016-06-14'
)
group by worker_id, date1, grp
order by worker_id, date1, grp;

 WORKER_ID DATE1      FR_HO TO_HO HOURS
---------- ---------- ----- ----- -----
         2 13/06/2016 09:00 13:20 04:20
         2 13/06/2016 15:00 17:00 02:00
         2 13/06/2016 19:00 22:00 03:00
         3 13/06/2016 12:10 18:45 06:35
         4 13/06/2016 00:00 03:00 03:00
         4 13/06/2016 13:00 14:30 01:30
         4 13/06/2016 19:00 22:00 03:00

Upvotes: 0

Mottor
Mottor

Reputation: 1948

In the inner query we get for every row the date1,type1 from the next row (LEAD 1) for the same worker and than we filter only what we need:

SELECT worker_id, 
      TRUNC (date1) AS date1, 
      TO_CHAR (date1, 'HH24:MI') fr_hour, 
      TO_CHAR (date2, 'HH24:MI') to_hour, 
      TRUNC ( (date2 - date1) * 24) || ':' || 
        TO_CHAR (TRUNC ( (date2 - date1) * 24 * 60) - TRUNC ( (date2 - date1) * 24) * 60, '00') hours
  FROM (SELECT a.*, 
               LEAD (a.date1, 1) OVER (PARTITION BY worker_id ORDER BY date1) date2, 
               LEAD (a.type1, 1) OVER (PARTITION BY worker_id ORDER BY date1) type2
          FROM testtemp a)
 WHERE type1 = 0 
   AND type2 = 1 
   AND TRUNC (date1) = TRUNC (date2)

Upvotes: 1

Related Questions