Ace Amr
Ace Amr

Reputation: 108

Select only rows where time difference is x hours

I have a table passenger_log with four columns psngr_id (not necessarily unique), arrival_dt_tm, departure_dt_tm and timediff (the difference between the passengers last departure date and next arrival date).

From this table I'd like to be able to select only rows where the time difference is less than 24 hours and also select the next row.

I am on SQL 2008 so can’t really use the latest versions LEAD and LAG functions.

Existing table -

local_passenger_id arrival_date_time departure_date_time timediff
-----------------------------------------------------------------

00F9P0L193  28/07/2013 23:30    28/07/2013 23:36    0.516666
00F9P0L193  29/07/2013 00:07    29/07/2013 03:16    NULL
01MLEMFDUK  12/10/2008 16:43    12/10/2008 20:28    21.116666
01MLEMFDUK  13/10/2008 17:35    13/10/2008 20:19    3889.9
01MLEMFDUK  24/03/2009 22:13    25/03/2009 01:23    32268
01MLEMFDUK  28/11/2012 13:23    28/11/2012 14:10    676.583333
01MLEMFDUK  26/12/2012 18:45    26/12/2012 19:27    20.433333
01MLEMFDUK  27/12/2012 15:53    27/12/2012 17:07    289.2
01MLEMFDUK  08/01/2013 18:19    08/01/2013 19:00    NULL
02CJHH6KAJ  27/02/2011 09:48    27/02/2011 11:56    10167.25
02CJHH6KAJ  26/04/2012 03:11    26/04/2012 06:42    44.566666
02CJHH6KAJ  28/04/2012 03:16    28/04/2012 07:06    23.233333
02CJHH6KAJ  29/04/2012 06:20    29/04/2012 09:45    NULL

Desired output -

local_passenger_id arrival_date_time departure_date_time timediff_mins
----------------------------------------------------------------------

00F9P0L193  28/07/2013 23:30    28/07/2013 23:36    0.516666
00F9P0L193  29/07/2013 00:07    29/07/2013 03:16    NULL
01MLEMFDUK  12/10/2008 16:43    12/10/2008 20:28    21.116666
01MLEMFDUK  13/10/2008 17:35    13/10/2008 20:19    3889.9
01MLEMFDUK  26/12/2012 18:45    26/12/2012 19:27    20.433333
01MLEMFDUK  27/12/2012 15:53    27/12/2012 17:07    289.2
02CJHH6KAJ  28/04/2012 03:16    28/04/2012 07:06    23.233333
02CJHH6KAJ  29/04/2012 06:20    29/04/2012 09:45    NULL

Upvotes: 1

Views: 278

Answers (2)

Carsten Massmann
Carsten Massmann

Reputation: 28236

Try this

WITH cte(id,pid,adt,ddt,tdif) AS
(SELECT ROW_NUMBER() OVER (ORDER BY pid,adt), pid,adt,ddt,tdif FROM pass),
 hit(id) AS
(SELECT id FROM cte c1 WHERE EXISTS 
  (SELECT 1 FROM cte c2 WHERE c2.id=c1.id+1 AND c2.pid=c1.pid AND c1.tdif<24) )

SELECT pid,adt,ddt,tdif FROM cte WHERE id IN 
  (SELECT id FROM hit UNION SELECT id+1 FROM hit)

Based on table

INSERT INTO pass
    ([pid], [adt], [ddt], [tdif])
VALUES
    ('00F9P0L193', '28.07.2013 23:30', '28.07.2013 23:36', '0.516666'),
    ('00F9P0L193', '29.07.2013 00:07', '29.07.2013 03:16', NULL),
    ('01MLEMFDUK', '12.10.2008 16:43', '12.10.2008 20:28', '21.116666'),
    ('01MLEMFDUK', '13.10.2008 17:35', '13.10.2008 20:19', '3889.9'),
    ('01MLEMFDUK', '24.03.2009 22:13', '25.03.2009 01:23', '32268'),
    ('01MLEMFDUK', '28.11.2012 13:23', '28.11.2012 14:10', '676.583333'),
    ('01MLEMFDUK', '26.12.2012 18:45', '26.12.2012 19:27', '20.433333'),
    ('01MLEMFDUK', '27.12.2012 15:53', '27.12.2012 17:07', '289.2'),
    ('01MLEMFDUK', '08.01.2013 18:19', '08.01.2013 19:00', NULL),
    ('02CJHH6KAJ', '27.02.2011 09:48', '27.02.2011 11:56', '10167.25'),
    ('02CJHH6KAJ', '26.04.2012 03:11', '26.04.2012 06:42', '44.566666'),
    ('02CJHH6KAJ', '28.04.2012 03:16', '28.04.2012 07:06', '23.233333'),
    ('02CJHH6KAJ', '29.04.2012 06:20', '29.04.2012 09:45', NULL)
;

Common table expressions:

  • cte is a numbered list (I simly used row_number() over all rows, regardless of passenger id)
  • hit is a table that contains only those rows where there is a following row with a time difference of <24h by the same passenger.

In the main select I bind together the rows from hit and the immediately following row by using a SUB-SELECT with UNION

output:

pid,    adt,    ddt,    tdif
00F9P0L193  28.07.2013 23:30    28.07.2013 23:36    0.516666
00F9P0L193  29.07.2013 00:07    29.07.2013 03:16    
01MLEMFDUK  12.10.2008 16:43    12.10.2008 20:28    21.116666
01MLEMFDUK  13.10.2008 17:35    13.10.2008 20:19    3889.9
01MLEMFDUK  26.12.2012 18:45    26.12.2012 19:27    20.433333
01MLEMFDUK  27.12.2012 15:53    27.12.2012 17:07    289.2
02CJHH6KAJ  28.04.2012 03:16    28.04.2012 07:06    23.233333
02CJHH6KAJ  29.04.2012 06:20    29.04.2012 09:45    

The same can also be done with just one CTE:

;WITH cte AS 
(SELECT ROW_NUMBER() OVER (ORDER BY pid,adt) id,pid,adt,ddt,tdif FROM pass)
SELECT pid,adt,ddt,tdif FROM cte c0 WHERE EXISTS (
 SELECT 1 FROM cte c1 WHERE c1.id IN(c0.id,c0.id-1) AND EXISTS 
   (SELECT 1 FROM cte c2 WHERE c2.id=c1.id+1 AND c2.pid=c1.pid AND c1.tdif<24) )

Probably not that easy to read, but works just the same ...

Upvotes: 1

Wietze314
Wietze314

Reputation: 6020

SELECT T.local_passenger_id, T.arrival_date_time, T.departure_date_time, T.timediff FROM passenger_log T
WHERE T.timediff < 24
UNION
SELECT T2.local_passenger_id, T2.arrival_date_time, T2.departure_date_time, T2.timediff FROM 
(SELECT *, RANK() OVER (PARTITION BY local_passenger_id ORDER BY arrival_date_time) AS Ranked FROM passenger_log) T1
LEFT JOIN 
(SELECT *, RANK() OVER (PARTITION BY local_passenger_id ORDER BY arrival_date_time) AS Ranked FROM passenger_log) T2
ON T1.local_passenger_id = T2.local_passenger_id AND T1.Ranked + 1 = T2.Ranked
WHERE T1.timediff < 24

ORDER BY local_passenger_id, arrival_date_time

Upvotes: 0

Related Questions