red
red

Reputation: 37

Overlapping Dates Exclusion

This is possibly a rather simple Question, but nevertheless a challenge for me. I have a table with 4 fields Person_id, Store_id, startdate and enddate. For a particular value for person_id, there can a number of records with different start and end dates. Here I need to find the time gap if it is more than 24 hours of current end date to next start date. The issue here is for each person_id there may records with overlapping periods.

Example1:

Person_id       Store_ID          Startdate           enddate
10000351067      10000232561      2010-04-08          2010-08-06
10000351067      10000232561      2016-09-09          2016-09-16
10000351067      10000232561      2016-09-16          2016-10-03
10000351067      10000232561      2016-10-03          2016-10-07
10000351067      10000232561      2016-10-07          2017-01-17
10000351067      10000232561      2017-01-17          2018-04-05
**10000351067      10000232561      2017-06-16          2017-06-20**
10000351067      10000232561      2018-04-05          NULL

Example2:

10000193858      10000225875      2016-07-13          2016-08-03
10000193858      10000225875      2016-08-03          2017-05-17
10000193858      10000225875      2017-05-17          2017-06-05
10000193858      10000225875      2017-05-31          2017-06-05
10000193858      10000225875      2017-06-05          2017-06-13
10000193858      10000225875      2017-06-13          2017-08-16
10000193858      10000225875      2017-08-07          2017-08-16
10000193858      10000225875      2017-08-16          2017-08-18
10000193858      10000225875      2017-08-18          2017-08-31
10000193858      10000225875      2017-08-31          2018-01-05
**10000193858      10000225875      2017-11-13          2017-11-20**
10000193858      10000225875      2018-01-05          NULL

The record with the oldest 'startdate' needs to be retained in all cases. In the situation where there are multiple records with the same oldest 'startdate', the one with the largest 'enddate' needs to be retained. I tried by using a below query, but was unsuccessful (Possibly I have done something wrong).

CREATE TABLE #ordered_removal_list(
                [ID_New] [int] IDENTITY(1,1) NOT NULL,
                [person_id] [bigint] NULL,
                [Store_ID] [bigint] NULL,
                [started_at] [datetime] NULL,
                [ended_at] [datetime] NULL,     
)
INSERT INTO #ordered_removal_list 
(person_id,Store_ID,started_at,ended_at)
SELECTperson_id,Store_ID,started_at,ended_at into #test FROM Temp_Data
;WITH cte 
AS
(
SELECT ord1.person_id, ord1.started_at, ord1.ended_at, next1.started_at as next1_start,
Last1.started_at as last1_start, CASE WHEN DATEDIFF (HOUR, last1.ended_at, ord1.started_at) > 23 THEN 'GAP' ELSE 'NO_GAP' END as 'gap'
FROM #test ord1
LEFT JOIN #test next1 on next1.[ID_New] = ord1.[ID_New] + 1 and ord1.person_id = next1.person_id)
SELECT * FROM cte 
where gap = 'GAP'

I was unable to exclude the overlapping dates which are marked in red color in above examples.Any suggestions (preferably code examples)

Result set:

Example 1) IF I exclude the overlapping dates 10000351067, 10000232561, 2017-06-16, 2017-06-20 then their is time gap between current record enddate to following next start date i.e., No time gap between whole time period as compared to very next period.

Example 2) IF I exclude the overlapping dates 10000193858, 10000225875, 2017-11-13, 2017-11-20 then their is time gap between current record enddate to following next start date i.e., No time gap between whole time period as compared to very next period.

Thanks!

Upvotes: 0

Views: 569

Answers (3)

Sentinel
Sentinel

Reputation: 6449

Here's a solution that appears to work for the provided data. It works by checking that there are no overlapping records (p) that supersedes the current (c) record. An overlapping record supersedes the current record if it starts before the current record, or starts at the same time as the current record but ends after the current record NULLs in the start or end date columns are treated as the beginning or end of time respectively.

To determine record precedence add a row number column (RN) which also serves to prevent comparing a record with itself. When comparing records, the preceding record will have a row number less than the current record.

Once that's done all that remains is to check for overlaps See this SQL Fiddle for an example (please note I added 1 sample record with concurrent start times to your second data set to test for that condition):

with dta as (
  select row_number()
    over (partition by person_id, store_id
          order by case when startdate is null then 1 else 0 end
                 , startdate
                 , case when enddate is null then 1 else 2 end
                 , enddate desc) rn
       , a.*
    from YourData a
 )
 select * from dta c
 where not exists (
     select 1 from dta p
      where p.person_id = c.person_id
        and p.store_id = c.store_id
        -- Establish precedence
        and p.rn < c.rn
        -- Detect overlaps
        and (p.startdate is null or p.startdate < c.enddate or c.enddate is null)
        and (c.startdate is null or c.startdate < p.enddate or p.enddate is null)
   )

order by Person_id, store_id, startdate

Results:

| rn |   Person_id |    Store_ID |            Startdate |              enddate |
|----|-------------|-------------|----------------------|----------------------|
|  1 | 10000193858 | 10000225875 | 2016-07-13T00:00:00Z | 2016-08-03T00:00:00Z |
|  2 | 10000193858 | 10000225875 | 2016-08-03T00:00:00Z | 2017-05-17T00:00:00Z |
|  3 | 10000193858 | 10000225875 | 2017-05-17T00:00:00Z | 2017-06-05T00:00:00Z |
|  5 | 10000193858 | 10000225875 | 2017-06-05T00:00:00Z | 2017-06-13T00:00:00Z |
|  6 | 10000193858 | 10000225875 | 2017-06-13T00:00:00Z | 2017-08-16T00:00:00Z |
|  8 | 10000193858 | 10000225875 | 2017-08-16T00:00:00Z | 2017-08-18T00:00:00Z |
|  9 | 10000193858 | 10000225875 | 2017-08-18T00:00:00Z | 2017-08-31T00:00:00Z |
| 11 | 10000193858 | 10000225875 | 2017-08-31T00:00:00Z | 2018-01-05T00:00:00Z |
| 13 | 10000193858 | 10000225875 | 2018-01-05T00:00:00Z |               (null) |
|  1 | 10000351067 | 10000232561 | 2010-04-08T00:00:00Z | 2010-08-06T00:00:00Z |
|  2 | 10000351067 | 10000232561 | 2016-09-09T00:00:00Z | 2016-09-16T00:00:00Z |
|  3 | 10000351067 | 10000232561 | 2016-09-16T00:00:00Z | 2016-10-03T00:00:00Z |
|  4 | 10000351067 | 10000232561 | 2016-10-03T00:00:00Z | 2016-10-07T00:00:00Z |
|  5 | 10000351067 | 10000232561 | 2016-10-07T00:00:00Z | 2017-01-17T00:00:00Z |
|  6 | 10000351067 | 10000232561 | 2017-01-17T00:00:00Z | 2018-04-05T00:00:00Z |
|  8 | 10000351067 | 10000232561 | 2018-04-05T00:00:00Z |               (null) |

Upvotes: 0

uzi
uzi

Reputation: 4146

It seems that you are trying to exclude rows which are covered by other rows. If so, try this query

select
    *
from
    myTable a
where
    not exists (
        select 1
        from myTable b
        where
            a.Person_id = b.Person_id and a.Store_ID = b.Store_ID
            and a.Startdate > b.Startdate and a.enddate < b.enddate
    )

Upvotes: 0

The Impaler
The Impaler

Reputation: 48770

Here you go (example in PostgreSQL):

with attendance_ext as (
select a.*, (select min(startdate) from attendance x 
  where x.startdate > a.startdate) as nextstart
  from attendance a)
select * from attendance_ext
  where (startdate + interval '24 hours') < nextstart;

Assumming your table is like:

create table attendance (
  person_id int, 
  store_id int, 
  startdate timestamp, 
  enddate timestamp
);

insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2010-04-08', '2010-08-06');
insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2016-09-09', '2016-09-16');
insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2016-09-16', '2016-10-03');
insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2016-10-03', '2016-10-07');

Upvotes: 0

Related Questions