antoine.lange
antoine.lange

Reputation: 961

SQL performance issue : find a route

I'm struggling with performance issue on my of my SQL query

I have a train journey traveling 5 stations named "A - B - C - D - E". A passenger book a ticket for only "B - C - D" ride. I need to retrieve all stations my passengers goes to.

What I have stored :

JOURNEY
+----+--------------------+-------------------+-------------------+-----------------+
| id | departure_datetime | arrival_datetime  | departure_station | arrival_station |
+----+--------------------+-------------------+-------------------+-----------------+
|  1 | 2018-01-01 06:00   | 2018-01-01 10:00  | A                 | E               |
+----+--------------------+-------------------+-------------------+-----------------+

BOOKING
+----+------------+-------------------+-----------------+
| id | journey_id | departure_station | arrival_station |
+----+------------+-------------------+-----------------+
|  1 |          1 | B                 | D               |
+----+------------+-------------------+-----------------+

LEG
+----+------------+-------------------+-----------------+------------------+------------------+
| id | journey_id | departure_station | arrival_station |  departure_time  |   arrival_time   |
+----+------------+-------------------+-----------------+------------------+------------------+
|  1 |          1 | A                 | B               | 2018-01-01 06:00 | 2018-01-01 07:00 |
|  2 |          1 | B                 | C               | 2018-01-01 07:00 | 2018-01-01 08:00 |
|  3 |          1 | C                 | D               | 2018-01-01 08:00 | 2018-01-01 09:00 |
|  4 |          1 | D                 | E               | 2018-01-01 09:00 | 2018-01-01 10:00 |
+----+------------+-------------------+-----------------+------------------+------------------+

Only way I found to retrieve stations is :

select b.id as booking, l.departure_station, l.arrival_station
from JOURNEY j
inner join BOOKING b on j.id = b.journey_id
inner join LEG dl on (j.id = dl.journey_id and b.departure_station = dl.departure_station)
inner join LEG al on (j.id = al.journey_id and b.arrival_station = al.arrival_station)
inner join LEG l on (j.id = l.journey_id and l.departure_time >= dl.departure_time and l.arrival_time <= al.arrival_time)
where b.id = 1

But my LEG table is huge and doing this 3 joins on is very slow. Is there a way I can join only one time LEG table to increase performance ?

Intended return :

+------------+-------------------+-----------------+
| booking_id | departure_station | arrival_station |
+------------+-------------------+-----------------+
|          1 | B                 | C               |
|          1 | C                 | D               |
+------------+-------------------+-----------------+

I work on mariadb 12.2 so i have access to window function but i'm still not very comfortable with it.

Thanks.

EDIT : create tables :

CREATE TABLE `BOOKING` (
    `id` INT(11) NOT NULL,
    `journey_id` INT(11) NULL DEFAULT NULL,
    `departure_station` VARCHAR(50) NULL DEFAULT NULL,
    `arrival_station` VARCHAR(50) NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
);

CREATE TABLE `JOURNEY` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `departure_time` DATETIME NULL DEFAULT NULL,
    `arrival_time` DATETIME NULL DEFAULT NULL,
    `departure_station` VARCHAR(50) NULL DEFAULT NULL,
    `arrival_station` VARCHAR(50) NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
);

CREATE TABLE `LEG` (
    `id` INT(11) NOT NULL,
    `journey_id` INT(11) NULL DEFAULT NULL,
    `departure_station` VARCHAR(50) NULL DEFAULT NULL,
    `arrival_station` VARCHAR(50) NULL DEFAULT NULL,
    `departure_time` DATETIME NULL DEFAULT NULL,
    `arrival_time` DATETIME NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
);

Upvotes: 0

Views: 119

Answers (3)

Shuwn Yuan Tee
Shuwn Yuan Tee

Reputation: 5748

I would suggest using Common Table Expression (CTE):

WITH leg_cte as
(
    SELECT l.* FROM leg l
        JOIN booking b
            ON l.journey_id = b.journey_id
    WHERE b.id = 1
)

SELECT
    b.id as booking, 
    l.departure_station, 
    l.arrival_station
FROM
    booking b
    JOIN leg_cte dl
        ON b.departure_station = dl.departure_station
    JOIN leg_cte al 
        ON b.arrival_station = al.arrival_station
    JOIN leg_cte l 
        ON l.departure_time >= dl.departure_time AND l.arrival_time <= al.arrival_time

WHERE b.id = 1

Upvotes: 4

Alex
Alex

Reputation: 17289

I don't like your DB schema. But in your particular case, since you have your query working good for you. I would just create few indexes too speed up execution. In general there is nothing wrong when you need to join table few times to itself.

http://sqlfiddle.com/#!9/1a467/1

Try just add 4 indexes:

CREATE INDEX journey ON BOOKING (journey_id);
CREATE INDEX arrival ON LEG (journey_id, arrival_station);
CREATE INDEX departure ON LEG (journey_id, departure_station);
CREATE INDEX d_a_time ON LEG (journey_id, departure_time, arrival_time);

And run your query again, it should be much faster when using indexes.

Upvotes: 4

Wei Lin
Wei Lin

Reputation: 3821

Try it left join and use REGEXP to filiter departure_station and arrival_station

select T3.id booking_id , T1.departure_station,T1.arrival_station
from LEG T1
left join JOURNEY T2 on T1.`journey_id` = T2.`id`
  and (T1.`departure_time` >= T2.`departure_datetime` and T1.`arrival_time` <= T2.`arrival_datetime`)
left join BOOKING T3 on  T3.`id` = T2.`id` 
    and T1.departure_station REGEXP (CONCAT('[',T3.departure_station , '-' , T3.arrival_station,']' ))
    and T1.arrival_station REGEXP (CONCAT('[',T3.departure_station , '-' , T3.arrival_station,']' ))
where T1.journey_id = 1 and T3.id is not null ;

SQL Fiddle Demo Link

| booking_id | departure_station | arrival_station |
|------------|-------------------|-----------------|
|          1 |                 B |               C |
|          1 |                 C |               D |

Test DDL:

CREATE TABLE JOURNEY
    (`id` int, `departure_datetime` datetime, `arrival_datetime` datetime, `departure_station` varchar(1), `arrival_station` varchar(1))
;

INSERT INTO JOURNEY
    (`id`, `departure_datetime`, `arrival_datetime`, `departure_station`, `arrival_station`)
VALUES
    (1, '2018-01-01 06:00:00', '2018-01-01 10:00:00', 'A', 'E')
;


CREATE TABLE BOOKING
    (`id` int, `journey_id` int, `departure_station` varchar(1), `arrival_station` varchar(1))
;

INSERT INTO BOOKING
    (`id`, `journey_id`, `departure_station`, `arrival_station`)
VALUES
    (1, 1, 'B', 'D')
;


CREATE TABLE LEG
    (`id` int, `journey_id` int, `departure_station` varchar(1), `arrival_station` varchar(1), `departure_time` datetime, `arrival_time` datetime)
;

INSERT INTO LEG
    (`id`, `journey_id`, `departure_station`, `arrival_station`, `departure_time`, `arrival_time`)
VALUES
    (1, 1, 'A', 'B', '2018-01-01 06:00:00', '2018-01-01 07:00:00'),
    (2, 1, 'B', 'C', '2018-01-01 07:00:00', '2018-01-01 08:00:00'),
    (3, 1, 'C', 'D', '2018-01-01 08:00:00', '2018-01-01 09:00:00'),
    (4, 1, 'D', 'E', '2018-01-01 09:00:00', '2018-01-01 10:00:00')
;

Upvotes: 1

Related Questions