Reputation: 961
I'm struggling with performance issue on my of my SQL query
I have a train journey traveling 5 stations named "A - B - C - D - E". A passenger book a ticket for only "B - C - D" ride. I need to retrieve all stations my passengers goes to.
What I have stored :
JOURNEY
+----+--------------------+-------------------+-------------------+-----------------+
| id | departure_datetime | arrival_datetime | departure_station | arrival_station |
+----+--------------------+-------------------+-------------------+-----------------+
| 1 | 2018-01-01 06:00 | 2018-01-01 10:00 | A | E |
+----+--------------------+-------------------+-------------------+-----------------+
BOOKING
+----+------------+-------------------+-----------------+
| id | journey_id | departure_station | arrival_station |
+----+------------+-------------------+-----------------+
| 1 | 1 | B | D |
+----+------------+-------------------+-----------------+
LEG
+----+------------+-------------------+-----------------+------------------+------------------+
| id | journey_id | departure_station | arrival_station | departure_time | arrival_time |
+----+------------+-------------------+-----------------+------------------+------------------+
| 1 | 1 | A | B | 2018-01-01 06:00 | 2018-01-01 07:00 |
| 2 | 1 | B | C | 2018-01-01 07:00 | 2018-01-01 08:00 |
| 3 | 1 | C | D | 2018-01-01 08:00 | 2018-01-01 09:00 |
| 4 | 1 | D | E | 2018-01-01 09:00 | 2018-01-01 10:00 |
+----+------------+-------------------+-----------------+------------------+------------------+
Only way I found to retrieve stations is :
select b.id as booking, l.departure_station, l.arrival_station
from JOURNEY j
inner join BOOKING b on j.id = b.journey_id
inner join LEG dl on (j.id = dl.journey_id and b.departure_station = dl.departure_station)
inner join LEG al on (j.id = al.journey_id and b.arrival_station = al.arrival_station)
inner join LEG l on (j.id = l.journey_id and l.departure_time >= dl.departure_time and l.arrival_time <= al.arrival_time)
where b.id = 1
But my LEG table is huge and doing this 3 joins on is very slow. Is there a way I can join only one time LEG table to increase performance ?
Intended return :
+------------+-------------------+-----------------+
| booking_id | departure_station | arrival_station |
+------------+-------------------+-----------------+
| 1 | B | C |
| 1 | C | D |
+------------+-------------------+-----------------+
I work on mariadb 12.2 so i have access to window function but i'm still not very comfortable with it.
Thanks.
EDIT : create tables :
CREATE TABLE `BOOKING` (
`id` INT(11) NOT NULL,
`journey_id` INT(11) NULL DEFAULT NULL,
`departure_station` VARCHAR(50) NULL DEFAULT NULL,
`arrival_station` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `JOURNEY` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`departure_time` DATETIME NULL DEFAULT NULL,
`arrival_time` DATETIME NULL DEFAULT NULL,
`departure_station` VARCHAR(50) NULL DEFAULT NULL,
`arrival_station` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `LEG` (
`id` INT(11) NOT NULL,
`journey_id` INT(11) NULL DEFAULT NULL,
`departure_station` VARCHAR(50) NULL DEFAULT NULL,
`arrival_station` VARCHAR(50) NULL DEFAULT NULL,
`departure_time` DATETIME NULL DEFAULT NULL,
`arrival_time` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
Upvotes: 0
Views: 119
Reputation: 5748
I would suggest using Common Table Expression (CTE):
WITH leg_cte as
(
SELECT l.* FROM leg l
JOIN booking b
ON l.journey_id = b.journey_id
WHERE b.id = 1
)
SELECT
b.id as booking,
l.departure_station,
l.arrival_station
FROM
booking b
JOIN leg_cte dl
ON b.departure_station = dl.departure_station
JOIN leg_cte al
ON b.arrival_station = al.arrival_station
JOIN leg_cte l
ON l.departure_time >= dl.departure_time AND l.arrival_time <= al.arrival_time
WHERE b.id = 1
Upvotes: 4
Reputation: 17289
I don't like your DB schema. But in your particular case, since you have your query working good for you. I would just create few indexes too speed up execution. In general there is nothing wrong when you need to join table few times to itself.
http://sqlfiddle.com/#!9/1a467/1
Try just add 4 indexes:
CREATE INDEX journey ON BOOKING (journey_id);
CREATE INDEX arrival ON LEG (journey_id, arrival_station);
CREATE INDEX departure ON LEG (journey_id, departure_station);
CREATE INDEX d_a_time ON LEG (journey_id, departure_time, arrival_time);
And run your query again, it should be much faster when using indexes.
Upvotes: 4
Reputation: 3821
Try it left join
and use REGEXP
to filiter departure_station and arrival_station
select T3.id booking_id , T1.departure_station,T1.arrival_station
from LEG T1
left join JOURNEY T2 on T1.`journey_id` = T2.`id`
and (T1.`departure_time` >= T2.`departure_datetime` and T1.`arrival_time` <= T2.`arrival_datetime`)
left join BOOKING T3 on T3.`id` = T2.`id`
and T1.departure_station REGEXP (CONCAT('[',T3.departure_station , '-' , T3.arrival_station,']' ))
and T1.arrival_station REGEXP (CONCAT('[',T3.departure_station , '-' , T3.arrival_station,']' ))
where T1.journey_id = 1 and T3.id is not null ;
| booking_id | departure_station | arrival_station |
|------------|-------------------|-----------------|
| 1 | B | C |
| 1 | C | D |
Test DDL:
CREATE TABLE JOURNEY
(`id` int, `departure_datetime` datetime, `arrival_datetime` datetime, `departure_station` varchar(1), `arrival_station` varchar(1))
;
INSERT INTO JOURNEY
(`id`, `departure_datetime`, `arrival_datetime`, `departure_station`, `arrival_station`)
VALUES
(1, '2018-01-01 06:00:00', '2018-01-01 10:00:00', 'A', 'E')
;
CREATE TABLE BOOKING
(`id` int, `journey_id` int, `departure_station` varchar(1), `arrival_station` varchar(1))
;
INSERT INTO BOOKING
(`id`, `journey_id`, `departure_station`, `arrival_station`)
VALUES
(1, 1, 'B', 'D')
;
CREATE TABLE LEG
(`id` int, `journey_id` int, `departure_station` varchar(1), `arrival_station` varchar(1), `departure_time` datetime, `arrival_time` datetime)
;
INSERT INTO LEG
(`id`, `journey_id`, `departure_station`, `arrival_station`, `departure_time`, `arrival_time`)
VALUES
(1, 1, 'A', 'B', '2018-01-01 06:00:00', '2018-01-01 07:00:00'),
(2, 1, 'B', 'C', '2018-01-01 07:00:00', '2018-01-01 08:00:00'),
(3, 1, 'C', 'D', '2018-01-01 08:00:00', '2018-01-01 09:00:00'),
(4, 1, 'D', 'E', '2018-01-01 09:00:00', '2018-01-01 10:00:00')
;
Upvotes: 1