Natim
Natim

Reputation: 18112

How to do this GROUP BY with the wanted result?

Basically, I have a table with all the bus stops of a route with the time_from_start value, that helps to put them in a good order.

CREATE TABLE `api_routestop` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `route_id` int(11) NOT NULL,
  `station_id` varchar(10) NOT NULL,
  `time_from_start` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `api_routestop_4fe3422a` (`route_id`),
  KEY `api_routestop_15e3331d` (`station_id`)
)

I want to return for each stop of a line the time to go to the next stop.

I tried with this QUERY :

SELECT r1.station_id, r2.station_id, r1.route_id, COUNT(*), (r2.time_from_start - r1.time_from_start) as time
FROM api_routestop r1
LEFT JOIN api_routestop r2 ON r1.route_id = r2.route_id AND r1.id <> r2.id
GROUP BY r1.station_id
HAVING time >= 0
ORDER BY r1.route_id, r1.time_from_start, r2.time_from_start

But the group by seams not to work and the result looks like :

+------------+------------+----------+----------+------+
| station_id | station_id | route_id | COUNT(*) | time |
+------------+------------+----------+----------+------+
| Rub01      | Sal01      |        1 |       16 |    1 |
| Lyc02      | Sch02      |        2 |       17 |    2 |
| Paq01      | PoB01      |        3 |       15 |    1 |
| LaT02      | Gco02      |        4 |       16 |    1 |
| Sup01      | Tur01      |        5 |      132 |    1 |
| Oeu02      | CtC02      |        6 |       20 |    2 |
| Ver02      | Elo02      |        7 |       38 |    1 |
| Can01      | Mbo01      |        8 |       70 |    1 |
| Ver01      | Elo01      |        9 |       77 |    1 |
| MCH01      | for02      |       10 |       77 |    1 |
+------------+------------+----------+----------+------+

If I do that :

SELECT r1.station_id, r2.station_id, r1.route_id, COUNT(*), (r2.time_from_start - r1.time_from_start) as time
FROM api_routestop r1
LEFT JOIN api_routestop r2 ON r1.route_id = r2.route_id AND r1.id <> r2.id
GROUP BY r1.station_id, r2.station_id, r1.route_id
HAVING time >= 0
ORDER BY r1.route_id, r1.time_from_start, r2.time_from_start

I am approching :

+------------+------------+----------+----------+------+
| station_id | station_id | route_id | COUNT(*) | time |
+------------+------------+----------+----------+------+
| Rub01      | Sal01      |        1 |        1 |    1 |
| Rub01      | ARM01      |        1 |        1 |    2 |
| Rub01      | MaV01      |        1 |        1 |    4 |
| Rub01      | COl01      |        1 |        1 |    5 |
| Rub01      | Str01      |        1 |        1 |    6 |
| Rub01      | Jau01      |        1 |        1 |    7 |
| Rub01      | Cdp01      |        1 |        1 |    9 |
| Rub01      | Rep01      |        1 |        1 |   11 |
| Rub01      | CoT01      |        1 |        1 |   12 |
| Rub01      | Ctr01      |        1 |        1 |   14 |
| Rub01      | FLy01      |        1 |        1 |   15 |
| Rub01      | Lib01      |        1 |        1 |   17 |
| Rub01      | Bru01      |        1 |        1 |   18 |
| Rub01      | Sch01      |        1 |        1 |   20 |
| Rub01      | Lyc01      |        1 |        1 |   22 |
| Rub01      | Res01      |        1 |        1 |   24 |
| Sal01      | ARM01      |        1 |        1 |    1 |
| Sal01      | MaV01      |        1 |        1 |    3 |
| Sal01      | COl01      |        1 |        1 |    4 |
| Sal01      | Str01      |        1 |        1 |    5 |
| Sal01      | Jau01      |        1 |        1 |    6 |
| Sal01      | Cdp01      |        1 |        1 |    8 |
| Sal01      | Rep01      |        1 |        1 |   10 |
| Sal01      | CoT01      |        1 |        1 |   11 |
| Sal01      | Ctr01      |        1 |        1 |   13 |
| Sal01      | FLy01      |        1 |        1 |   14 |
| Sal01      | Lib01      |        1 |        1 |   16 |
| Sal01      | Bru01      |        1 |        1 |   17 |
| Sal01      | Sch01      |        1 |        1 |   19 |
| Sal01      | Lyc01      |        1 |        1 |   21 |
...
3769 rows in set (0.07 sec)

But what do I have to do to have only the first result for the same r1.station_id and r1.route_id ?

Upvotes: 0

Views: 188

Answers (3)

Conrad Frix
Conrad Frix

Reputation: 52655

You're getting a lot of results back because your getting every stop joined to every other stop on the same route.

So you'll need to identify the "Next" stop as the stop that has the same route ID but has a minimum time from start later than the current one

Update Added routeId to the next_stop sub query to deal with the case of stations used in multiple routes

SELECT
    r1.station_id, 
    r2.station_id, 
    r1.route_id, 
    r2.time_from_start - r1.time_from_start as time

FROM
api_routestop r1 
INNER JOIN (SELECT 
         r1.station_id , r2.route_id, min(r2.time_from_start) next_time_from_start
FROM 
  api_routestop r1
  LEFT JOIN api_routestop r2 ON r1.route_id = r2.route_id AND r1.id <> r2.id
       and r2.time_from_start > r1.time_from_start
  GROUP BY r1.Station_id, r2.route_id) next_stop

ON r1.Station_id = next_stop.station_id
   and r1.route_id = next_stop.route_id
LEFT JOIN api_routestop r2 
ON r2.time_from_start = r2.next_time_from_start
    and r1.route_id = r2.route_id
AND r2.time_from_start > r1.time_from_start

Upvotes: 2

Peposh
Peposh

Reputation: 182

SELECT station_id, coalesce( 
   (SELECT time_from_start 
      FROM api_routestop t2 
     WHERE t2.time_from_start > t1.time_from_start 
       AND t2.time_from_start <= (SELECT time_from_start FROM api_routestop t5 WHERE t5.station_id = '4' AND t5.route_id=t1.route_id)
       AND t2.route_id = t1.route_id 
     ORDER BY t2.time_from_start LIMIT 1), time_from_start) - time_from_start AS difference 
 FROM api_routestop t1 
WHERE t1.route_id = 1
  AND t1.time_from_start >= (SELECT time_from_start FROM api_routestop t4 WHERE t4.station_id = '2' AND t4.route_id=t1.route_id)
  AND t1.time_from_start <= (SELECT time_from_start FROM api_routestop t5 WHERE t5.station_id = '4' AND t5.route_id=t1.route_id)
ORDER BY time_from_start

Upvotes: 2

Martin Smith
Martin Smith

Reputation: 453426

Are you open to changing the schema? If so simply adding a column containing a sequential integer for all stops on route will make this query a lot easier and more efficient.

Failing that this will do it.

SELECT
      station_id,
      route_id,
      time_from_start, 
      time_to_next
FROM
(
SELECT
    station_id,route_id,time_from_start,
    IF( @prev <> route_id, null, @time_from_start-time_from_start ) AS time_to_next,
    @time_from_start := time_from_start,
    @prev := route_id
  FROM api_routestop
  JOIN (SELECT @time_from_start := NULL, @prev := 0) AS r
  ORDER BY route_id, time_from_start DESC
  ) t
ORDER BY route_id,time_from_start

Upvotes: 1

Related Questions