OCDan
OCDan

Reputation: 1153

Find Segment with Longest Stay Per Booking

We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.

Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.

This is a simplified version of the tables:

Create Table Segments
(
  BookingID int,
  SegNum int,
  DepartureCity varchar(100),
  DepartDate datetime,
  ArrivalCity varchar(100),
  ArrivalDate datetime
);

Create Table Bookings
(
 BookingID int identity(1,1),
 Locator varchar(10)
);

Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values  (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')


Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')

I have created a SQL Fiddle with real data here: SQL Fiddle Example

I have tried to do the following, however this doesn't appear to be correct.

 SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
    seg.segnum ,
    seg.DepartureCity ,
    seg.DepartDate ,
    seg.ArrivalCity ,
    seg.ArrivalDate,
(SELECT
DISTINCT
    DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
    FROM Bookings b(NOLOCK)
    INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
    WHERE b.Locator = ob.locator
  ) a
Order by a.lengthofstay desc
  )
FD

The results I expect are:

Locator   Origin   Destination 

5MVL7J    BRU      FIH

Y2IMXQ    BOD      EBB

YCBL5C    BRU      IAD

X7THJ6    BRU      LUN

I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.

I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:

WITH CTE AS 
(
    SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN 
    FROM Segments s
    JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM 
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1

Upvotes: 3

Views: 145

Answers (3)

Oleksandr Fedorenko
Oleksandr Fedorenko

Reputation: 16904

You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression

;WITH cte AS
 (
  SELECT seg.BookingID,
         CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum 
              THEN seg.DepartureCity END AS Origin,
         CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID) 
           = DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
              THEN o.DepartureCity END AS Destination
  FROM Segments seg (NOLOCK)
    OUTER APPLY (
                 SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
                 FROM Segments seg2
                 WHERE seg.BookingID = seg2.BookingID 
                   AND seg.SegNum < seg2.SegNum
                 ORDER BY seg2.SegNum ASC
                 ) o
  )
  SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
  FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
  GROUP BY b.Locator

See demo on SQLFiddle

Upvotes: 3

Nenad Zivkovic
Nenad Zivkovic

Reputation: 18559

You can try something like this:

;WITH CTE_Start AS 
(
    --Ordering of segments to eliminate gaps
    SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN 
    FROM dbo.Segments  
)
, RCTE_Stay AS 
(
    --recursive CTE to calculate stay between segments
    SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
    UNION ALL
    SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate) 
    FROM CTE_Start sNext
    INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
    --Search for max(stay) for each bookingID
    SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay 
    FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1

SQLFiddle DEMO

Upvotes: 3

gotqn
gotqn

Reputation: 43636

The statement below:

;WITH DataSource AS
(

  SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
        ,Segments.BookingID
        ,Segments.SegNum
        ,Segments.DepartureCity
        ,Segments.DepartDate
        ,Segments.ArrivalCity
        ,Segments.ArrivalDate
        ,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
  FROM Segments
)
SELECT * 
FROM DataSource DS
INNER JOIN Bookings B
  ON DS.[BookingID] = B.[BookingID]

Will give the following output:

enter image description here

So, adding the following clause to the above statement:

WHERE Row = 1

will give you what you need.

Few important things:

  1. As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.

  2. The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:

If the return value is out of range for int (-2,147,483,648 to +2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31 minutes and 23.647 seconds. For second, the maximum difference is 68 years.

Upvotes: 0

Related Questions