Reputation: 108
I have a table passenger_log
with four columns psngr_id
(not necessarily unique), arrival_dt_tm
, departure_dt_tm
and timediff
(the difference between the passengers last departure date and next arrival date).
From this table I'd like to be able to select only rows where the time difference is less than 24 hours and also select the next row.
I am on SQL 2008 so can’t really use the latest versions LEAD
and LAG
functions.
Existing table -
local_passenger_id arrival_date_time departure_date_time timediff
-----------------------------------------------------------------
00F9P0L193 28/07/2013 23:30 28/07/2013 23:36 0.516666
00F9P0L193 29/07/2013 00:07 29/07/2013 03:16 NULL
01MLEMFDUK 12/10/2008 16:43 12/10/2008 20:28 21.116666
01MLEMFDUK 13/10/2008 17:35 13/10/2008 20:19 3889.9
01MLEMFDUK 24/03/2009 22:13 25/03/2009 01:23 32268
01MLEMFDUK 28/11/2012 13:23 28/11/2012 14:10 676.583333
01MLEMFDUK 26/12/2012 18:45 26/12/2012 19:27 20.433333
01MLEMFDUK 27/12/2012 15:53 27/12/2012 17:07 289.2
01MLEMFDUK 08/01/2013 18:19 08/01/2013 19:00 NULL
02CJHH6KAJ 27/02/2011 09:48 27/02/2011 11:56 10167.25
02CJHH6KAJ 26/04/2012 03:11 26/04/2012 06:42 44.566666
02CJHH6KAJ 28/04/2012 03:16 28/04/2012 07:06 23.233333
02CJHH6KAJ 29/04/2012 06:20 29/04/2012 09:45 NULL
Desired output -
local_passenger_id arrival_date_time departure_date_time timediff_mins
----------------------------------------------------------------------
00F9P0L193 28/07/2013 23:30 28/07/2013 23:36 0.516666
00F9P0L193 29/07/2013 00:07 29/07/2013 03:16 NULL
01MLEMFDUK 12/10/2008 16:43 12/10/2008 20:28 21.116666
01MLEMFDUK 13/10/2008 17:35 13/10/2008 20:19 3889.9
01MLEMFDUK 26/12/2012 18:45 26/12/2012 19:27 20.433333
01MLEMFDUK 27/12/2012 15:53 27/12/2012 17:07 289.2
02CJHH6KAJ 28/04/2012 03:16 28/04/2012 07:06 23.233333
02CJHH6KAJ 29/04/2012 06:20 29/04/2012 09:45 NULL
Upvotes: 1
Views: 278
Reputation: 28236
Try this
WITH cte(id,pid,adt,ddt,tdif) AS
(SELECT ROW_NUMBER() OVER (ORDER BY pid,adt), pid,adt,ddt,tdif FROM pass),
hit(id) AS
(SELECT id FROM cte c1 WHERE EXISTS
(SELECT 1 FROM cte c2 WHERE c2.id=c1.id+1 AND c2.pid=c1.pid AND c1.tdif<24) )
SELECT pid,adt,ddt,tdif FROM cte WHERE id IN
(SELECT id FROM hit UNION SELECT id+1 FROM hit)
Based on table
INSERT INTO pass
([pid], [adt], [ddt], [tdif])
VALUES
('00F9P0L193', '28.07.2013 23:30', '28.07.2013 23:36', '0.516666'),
('00F9P0L193', '29.07.2013 00:07', '29.07.2013 03:16', NULL),
('01MLEMFDUK', '12.10.2008 16:43', '12.10.2008 20:28', '21.116666'),
('01MLEMFDUK', '13.10.2008 17:35', '13.10.2008 20:19', '3889.9'),
('01MLEMFDUK', '24.03.2009 22:13', '25.03.2009 01:23', '32268'),
('01MLEMFDUK', '28.11.2012 13:23', '28.11.2012 14:10', '676.583333'),
('01MLEMFDUK', '26.12.2012 18:45', '26.12.2012 19:27', '20.433333'),
('01MLEMFDUK', '27.12.2012 15:53', '27.12.2012 17:07', '289.2'),
('01MLEMFDUK', '08.01.2013 18:19', '08.01.2013 19:00', NULL),
('02CJHH6KAJ', '27.02.2011 09:48', '27.02.2011 11:56', '10167.25'),
('02CJHH6KAJ', '26.04.2012 03:11', '26.04.2012 06:42', '44.566666'),
('02CJHH6KAJ', '28.04.2012 03:16', '28.04.2012 07:06', '23.233333'),
('02CJHH6KAJ', '29.04.2012 06:20', '29.04.2012 09:45', NULL)
;
Common table expressions:
cte
is a numbered list (I simly used row_number()
over all rows, regardless of passenger id)hit
is a table that contains only those rows where there is a following row with a time difference of <24h by the same passenger.In the main select
I bind together the rows from hit
and the immediately following row by using a SUB-SELECT
with UNION
output:
pid, adt, ddt, tdif
00F9P0L193 28.07.2013 23:30 28.07.2013 23:36 0.516666
00F9P0L193 29.07.2013 00:07 29.07.2013 03:16
01MLEMFDUK 12.10.2008 16:43 12.10.2008 20:28 21.116666
01MLEMFDUK 13.10.2008 17:35 13.10.2008 20:19 3889.9
01MLEMFDUK 26.12.2012 18:45 26.12.2012 19:27 20.433333
01MLEMFDUK 27.12.2012 15:53 27.12.2012 17:07 289.2
02CJHH6KAJ 28.04.2012 03:16 28.04.2012 07:06 23.233333
02CJHH6KAJ 29.04.2012 06:20 29.04.2012 09:45
The same can also be done with just one CTE:
;WITH cte AS
(SELECT ROW_NUMBER() OVER (ORDER BY pid,adt) id,pid,adt,ddt,tdif FROM pass)
SELECT pid,adt,ddt,tdif FROM cte c0 WHERE EXISTS (
SELECT 1 FROM cte c1 WHERE c1.id IN(c0.id,c0.id-1) AND EXISTS
(SELECT 1 FROM cte c2 WHERE c2.id=c1.id+1 AND c2.pid=c1.pid AND c1.tdif<24) )
Probably not that easy to read, but works just the same ...
Upvotes: 1
Reputation: 6020
SELECT T.local_passenger_id, T.arrival_date_time, T.departure_date_time, T.timediff FROM passenger_log T
WHERE T.timediff < 24
UNION
SELECT T2.local_passenger_id, T2.arrival_date_time, T2.departure_date_time, T2.timediff FROM
(SELECT *, RANK() OVER (PARTITION BY local_passenger_id ORDER BY arrival_date_time) AS Ranked FROM passenger_log) T1
LEFT JOIN
(SELECT *, RANK() OVER (PARTITION BY local_passenger_id ORDER BY arrival_date_time) AS Ranked FROM passenger_log) T2
ON T1.local_passenger_id = T2.local_passenger_id AND T1.Ranked + 1 = T2.Ranked
WHERE T1.timediff < 24
ORDER BY local_passenger_id, arrival_date_time
Upvotes: 0