Dede Soetopo
Dede Soetopo

Reputation: 65

Count Data by Loop Calendar SQL/Oracle

I need to get the data that generates count of total ID by date between date_active and date_end using date ranges for each. If the dates are crossing each other the ID will adding up. here is the data I have right now,

  TABLE CONTRACT:
  ID    DATE_ACTIVE    DATE_END
  1   05-FEB-13      08-NOV-13
  1   21-DEC-18      06-OCT-19
  2   05-FEB-13      27-JAN-14
  3   05-FEB-13      07-NOV-13
  4   06-FEB-13      02-NOV-13
  4   25-OCT-14      13-APR-16

  TABLE CALENDAR:
  DT
  05-FEB-13
  06-FEB-13
  07-FEB-13
  08-FEB-13
  09-FEB-13
  ..-DEC-19

what I want out is basically like this:

 DT         COUNT(ID)
 05-FEB-13    3
 06-FEB-13    4
 07-FEB-13    4
 08-FEB-13    4
 09-FEB-13    4
 10-FEB-13    4
 ....
 03-NOV-13    3
 ....
 08-NOV-13    2
 09-NOV-13    1
 ....
 28-JAN-14    0
 ....
 25-OCT-14    1
 ....
 13-APR-16    1
 14-APR-16    0
 ....
 21-DEC-18    1
 ....
 06-OCT-19    1
 07-OCT-19    0
 ....
 ....

And here is my query to get that result

with contract as (
select * from contract
where id in ('1','2','3','4')
)
,
cal as 
(
select TRUNC (SYSDATE - ROWNUM) dt
from dual
connect by rownum < sysdate - to_date('05-FEB-13')
)

select aa.dt,count(distinct bb.id)id from cal aa
left join contract bb on aa.dt >= bb.date_active and aa.dt<= bb.date_end
group by aa.dt
order by 1

but the problem is I have 6 mio of ID and if I use this kind of query, the result maybe will take forever, and I'm having a hard times to figured out how to get the result with different query. It will be my pleasure if somebody can help me out of this. Thank you so much.

Upvotes: 2

Views: 368

Answers (2)

Quassnoi
Quassnoi

Reputation: 425341

If you group your events by date_active and date_end, you will get the numbers of events which have started and ended on each separate day.

Not a lot of days have passed between 2013 and 2019 (about 2 000), so the grouped resultsets will be relatively short.

Now that you have the two groups, you can notice that the number of events on each given date is the number of events which have started on or before this date, minus the number of events which have finished on or before this date (I'm assuming the end dates are non-inclusive).

In other words, the number of events on every given day is:

  1. The number of events on the previous date,
  2. plus the number of events started on this date,
  3. minus the number of events ended on this date.

This can be easily done using a window function.

This will require a join between the calendar table and the two groups, but fortunately all of them are relatively short (thousands of records) and the join would be fast.

Here's the query: http://sqlfiddle.com/#!4/b21ce/5

WITH    cal AS
        (
        SELECT  TRUNC (to_date('01-NOV-13') - ROWNUM) dt
        FROM    dual
        CONNECT BY
                rownum < to_date('01-NOV-13')- to_date('01-FEB-13')
        ),
        started_on AS
        (
        SELECT  date_active AS dt, COUNT(*) AS cnt_start
        FROM    contract
        GROUP BY
                date_active
        ),
        ended_on AS
        (
        SELECT  date_end AS dt, COUNT(*) AS cnt_end
        FROM    contract
        GROUP BY
                date_end
        )
SELECT  dt,
        SUM(COALESCE(cnt_start, 0) - COALESCE(cnt_end, 0)) OVER (ORDER BY dt) cnt
FROM    cal c
LEFT JOIN
        started_on s
USING   (dt)
LEFT JOIN
        ended_on e
USING   (dt)

(I used a fixed date instead of SYSDATE to keep the resultset short, but the idea is the same)

This query requires that the calendar starts before the earliest event, otherwise every result will be off by a fixed amount, the number of events before the beginning of the calendar.

You can replace the fixed date in the calendar condition with (SELECT MIN(date_active) FROM contract) which is instant if date_active is indexed.

Update:

If your contract dates can overlap and you want to collapse multiple overlapping contracts into a one continuous contract, you can use window functions to do so.

WITH    cal AS
        (
        SELECT  TRUNC (to_date('01-NOV-13') - ROWNUM) dt
        FROM    dual
        CONNECT BY
                rownum <= to_date('01-NOV-13')- to_date('01-FEB-13')
        ),
        collapsed_contract AS
        (
        SELECT  *
        FROM    (
                SELECT  c.*,
                        COALESCE(LAG(date_end_effective) OVER (PARTITION BY id ORDER BY date_active), date_active) AS date_start_effective
                FROM    (
                        SELECT  c.*,
                                MAX(date_end) OVER (PARTITION BY id ORDER BY date_active) AS date_end_effective
                        FROM    contract c
                        ) c
                ) c
        WHERE   date_start_effective < date_end_effective
        ),
        started_on AS
        (
        SELECT  date_start_effective AS dt, COUNT(*) AS cnt_start
        FROM    collapsed_contract
        GROUP BY
                date_start_effective
        ),
        ended_on AS
        (
        SELECT  date_end_effective AS dt, COUNT(*) AS cnt_end
        FROM    collapsed_contract
        GROUP BY
                date_end_effective
        )
SELECT  dt,
        SUM(COALESCE(cnt_start, 0) - COALESCE(cnt_end, 0)) OVER (ORDER BY dt) cnt
FROM    cal c
LEFT JOIN
        started_on s
USING   (dt)
LEFT JOIN
        ended_on e
USING   (dt)

http://sqlfiddle.com/#!4/adeba/1

The query might seem bulky, but that's to make it more efficient, as all these window functions can be calculated in a single pass over the table.

Note however that this single pass relies on the table being sorted on (id, date_active) so an index on these two fields is crucial.

Upvotes: 1

Barbaros &#214;zhan
Barbaros &#214;zhan

Reputation: 65218

Firstly, row_number() over (order by id,date_active) analytic function is used in order to generate unique ID values those will be substituted in connect by level <= ... and prior id = id syntax to get unpivoted hierarchical data :

with t0 as
(
  select row_number() over (order by id,date_active) as id, date_active, date_end
    from contract
), t1 as
(
  select date_active + level - 1 as dt
    from t0
  connect by level <= date_end - date_active + 1
      and prior id = id
      and prior sys_guid() is not null
)
select dt, count(*) 
  from t1    
 group by dt 
 order by dt

Demo

Upvotes: 0

Related Questions