Reputation: 37
This is possibly a rather simple Question, but nevertheless a challenge for me. I have a table with 4 fields Person_id, Store_id, startdate and enddate. For a particular value for person_id, there can a number of records with different start and end dates. Here I need to find the time gap if it is more than 24 hours of current end date to next start date. The issue here is for each person_id there may records with overlapping periods.
Example1:
Person_id Store_ID Startdate enddate
10000351067 10000232561 2010-04-08 2010-08-06
10000351067 10000232561 2016-09-09 2016-09-16
10000351067 10000232561 2016-09-16 2016-10-03
10000351067 10000232561 2016-10-03 2016-10-07
10000351067 10000232561 2016-10-07 2017-01-17
10000351067 10000232561 2017-01-17 2018-04-05
**10000351067 10000232561 2017-06-16 2017-06-20**
10000351067 10000232561 2018-04-05 NULL
Example2:
10000193858 10000225875 2016-07-13 2016-08-03
10000193858 10000225875 2016-08-03 2017-05-17
10000193858 10000225875 2017-05-17 2017-06-05
10000193858 10000225875 2017-05-31 2017-06-05
10000193858 10000225875 2017-06-05 2017-06-13
10000193858 10000225875 2017-06-13 2017-08-16
10000193858 10000225875 2017-08-07 2017-08-16
10000193858 10000225875 2017-08-16 2017-08-18
10000193858 10000225875 2017-08-18 2017-08-31
10000193858 10000225875 2017-08-31 2018-01-05
**10000193858 10000225875 2017-11-13 2017-11-20**
10000193858 10000225875 2018-01-05 NULL
The record with the oldest 'startdate' needs to be retained in all cases. In the situation where there are multiple records with the same oldest 'startdate', the one with the largest 'enddate' needs to be retained. I tried by using a below query, but was unsuccessful (Possibly I have done something wrong).
CREATE TABLE #ordered_removal_list(
[ID_New] [int] IDENTITY(1,1) NOT NULL,
[person_id] [bigint] NULL,
[Store_ID] [bigint] NULL,
[started_at] [datetime] NULL,
[ended_at] [datetime] NULL,
)
INSERT INTO #ordered_removal_list
(person_id,Store_ID,started_at,ended_at)
SELECTperson_id,Store_ID,started_at,ended_at into #test FROM Temp_Data
;WITH cte
AS
(
SELECT ord1.person_id, ord1.started_at, ord1.ended_at, next1.started_at as next1_start,
Last1.started_at as last1_start, CASE WHEN DATEDIFF (HOUR, last1.ended_at, ord1.started_at) > 23 THEN 'GAP' ELSE 'NO_GAP' END as 'gap'
FROM #test ord1
LEFT JOIN #test next1 on next1.[ID_New] = ord1.[ID_New] + 1 and ord1.person_id = next1.person_id)
SELECT * FROM cte
where gap = 'GAP'
I was unable to exclude the overlapping dates which are marked in red color in above examples.Any suggestions (preferably code examples)
Result set:
Example 1) IF I exclude the overlapping dates 10000351067, 10000232561, 2017-06-16, 2017-06-20 then their is time gap between current record enddate to following next start date i.e., No time gap between whole time period as compared to very next period.
Example 2) IF I exclude the overlapping dates 10000193858, 10000225875, 2017-11-13, 2017-11-20 then their is time gap between current record enddate to following next start date i.e., No time gap between whole time period as compared to very next period.
Thanks!
Upvotes: 0
Views: 569
Reputation: 6449
Here's a solution that appears to work for the provided data. It works by checking that there are no overlapping records (p) that supersedes the current (c) record. An overlapping record supersedes the current record if it starts before the current record, or starts at the same time as the current record but ends after the current record NULLs in the start or end date columns are treated as the beginning or end of time respectively.
To determine record precedence add a row number column (RN) which also serves to prevent comparing a record with itself. When comparing records, the preceding record will have a row number less than the current record.
Once that's done all that remains is to check for overlaps See this SQL Fiddle for an example (please note I added 1 sample record with concurrent start times to your second data set to test for that condition):
with dta as (
select row_number()
over (partition by person_id, store_id
order by case when startdate is null then 1 else 0 end
, startdate
, case when enddate is null then 1 else 2 end
, enddate desc) rn
, a.*
from YourData a
)
select * from dta c
where not exists (
select 1 from dta p
where p.person_id = c.person_id
and p.store_id = c.store_id
-- Establish precedence
and p.rn < c.rn
-- Detect overlaps
and (p.startdate is null or p.startdate < c.enddate or c.enddate is null)
and (c.startdate is null or c.startdate < p.enddate or p.enddate is null)
)
order by Person_id, store_id, startdate
| rn | Person_id | Store_ID | Startdate | enddate |
|----|-------------|-------------|----------------------|----------------------|
| 1 | 10000193858 | 10000225875 | 2016-07-13T00:00:00Z | 2016-08-03T00:00:00Z |
| 2 | 10000193858 | 10000225875 | 2016-08-03T00:00:00Z | 2017-05-17T00:00:00Z |
| 3 | 10000193858 | 10000225875 | 2017-05-17T00:00:00Z | 2017-06-05T00:00:00Z |
| 5 | 10000193858 | 10000225875 | 2017-06-05T00:00:00Z | 2017-06-13T00:00:00Z |
| 6 | 10000193858 | 10000225875 | 2017-06-13T00:00:00Z | 2017-08-16T00:00:00Z |
| 8 | 10000193858 | 10000225875 | 2017-08-16T00:00:00Z | 2017-08-18T00:00:00Z |
| 9 | 10000193858 | 10000225875 | 2017-08-18T00:00:00Z | 2017-08-31T00:00:00Z |
| 11 | 10000193858 | 10000225875 | 2017-08-31T00:00:00Z | 2018-01-05T00:00:00Z |
| 13 | 10000193858 | 10000225875 | 2018-01-05T00:00:00Z | (null) |
| 1 | 10000351067 | 10000232561 | 2010-04-08T00:00:00Z | 2010-08-06T00:00:00Z |
| 2 | 10000351067 | 10000232561 | 2016-09-09T00:00:00Z | 2016-09-16T00:00:00Z |
| 3 | 10000351067 | 10000232561 | 2016-09-16T00:00:00Z | 2016-10-03T00:00:00Z |
| 4 | 10000351067 | 10000232561 | 2016-10-03T00:00:00Z | 2016-10-07T00:00:00Z |
| 5 | 10000351067 | 10000232561 | 2016-10-07T00:00:00Z | 2017-01-17T00:00:00Z |
| 6 | 10000351067 | 10000232561 | 2017-01-17T00:00:00Z | 2018-04-05T00:00:00Z |
| 8 | 10000351067 | 10000232561 | 2018-04-05T00:00:00Z | (null) |
Upvotes: 0
Reputation: 4146
It seems that you are trying to exclude rows which are covered by other rows. If so, try this query
select
*
from
myTable a
where
not exists (
select 1
from myTable b
where
a.Person_id = b.Person_id and a.Store_ID = b.Store_ID
and a.Startdate > b.Startdate and a.enddate < b.enddate
)
Upvotes: 0
Reputation: 48770
Here you go (example in PostgreSQL):
with attendance_ext as (
select a.*, (select min(startdate) from attendance x
where x.startdate > a.startdate) as nextstart
from attendance a)
select * from attendance_ext
where (startdate + interval '24 hours') < nextstart;
Assumming your table is like:
create table attendance (
person_id int,
store_id int,
startdate timestamp,
enddate timestamp
);
insert into attendance (person_id, store_id, startdate, enddate)
values (1, 1, '2010-04-08', '2010-08-06');
insert into attendance (person_id, store_id, startdate, enddate)
values (1, 1, '2016-09-09', '2016-09-16');
insert into attendance (person_id, store_id, startdate, enddate)
values (1, 1, '2016-09-16', '2016-10-03');
insert into attendance (person_id, store_id, startdate, enddate)
values (1, 1, '2016-10-03', '2016-10-07');
Upvotes: 0