Keith D
Keith D

Reputation: 29

T-SQL subselect statement is returning all rows instead of limiting to 1 based on subselect

I am trying to return just the first row where the BLOCK_STOP_ORDER = 2. What is wrong with my SQL? Why isn't WHERE SCHEDULE.BLOCK_STOP_ORDER = (SELECT MIN(S1.BLOCK_STOP_ORDER.... working? When I run the subselect on its own it returns the value '2' - doesn't that mean it should then limit the query result to only the row(s) where BLOCK_STOP_ORDER = 2?

SELECT      ROUTE.ROUTE_ABBR, SCHEDULE.ROUTE_DIRECTION_ID, SCHEDULE.PATTERN_ID, SCHEDULE.BLOCK_STOP_ORDER,
            SCHEDULE.SCHEDULED_TIME, GEO_NODE.GEO_NODE_ABBR, TRIP.TRIP_SEQUENCE AS TPST  
FROM        SCHEDULE 
 INNER JOIN GEO_NODE ON SCHEDULE.GEO_NODE_ID = GEO_NODE.GEO_NODE_ID
 INNER JOIN ROUTE    ON SCHEDULE.ROUTE_ID = ROUTE.ROUTE_ID 
 INNER JOIN TRIP     ON SCHEDULE.TRIP_ID = TRIP.TRIP_ID 
WHERE     (SCHEDULE.CALENDAR_ID = '120221024') AND ROUTE.ROUTE_ABBR = '001' 
    AND  SCHEDULE.ROUTE_DIRECTION_ID = '2'  AND SCHEDULE.PATTERN_ID = '270082'
    AND TRIP.TRIP_SEQUENCE = '18600' 

    AND SCHEDULE.BLOCK_STOP_ORDER =
      (SELECT MIN(S1.BLOCK_STOP_ORDER) 
        FROM SCHEDULE S1 
        WHERE SCHEDULE.CALENDAR_ID = S1.CALENDAR_ID 
           AND SCHEDULE.ROUTE_ID = S1.ROUTE_ID 
           AND SCHEDULE.ROUTE_DIRECTION_ID = S1.ROUTE_DIRECTION_ID 
           AND SCHEDULE.PATTERN_ID = S1.PATTERN_ID 
           AND SCHEDULE.SCHEDULED_TIME = S1.SCHEDULED_TIME  
           AND SCHEDULE.GEO_NODE_ID = S1.GEO_NODE_ID  
           AND SCHEDULE.BLOCK_STOP_ORDER = S1.BLOCK_STOP_ORDER 
           AND SCHEDULE.TRIP_ID = S1.TRIP_ID
        )
 
GROUP BY  ROUTE.ROUTE_ABBR, SCHEDULE.ROUTE_DIRECTION_ID,
    SCHEDULE.PATTERN_ID, SCHEDULE.SCHEDULED_TIME, 
    GEO_NODE.GEO_NODE_ABBR, SCHEDULE.BLOCK_STOP_ORDER, TRIP.TRIP_SEQUENCE 

ORDER BY ROUTE.ROUTE_ABBR, SCHEDULE.ROUTE_DIRECTION_ID, TRIP.TRIP_SEQUENCE

Results:

ROUTE_ABBR ROUTE_DIRECTION_ID PATTERN_ID BLOCK_STOP_ORDER SCHEDULED_TIME GEO_NODE_ABBR TPST
001 2 270082 2 18600 1251 18600
001 2 270082 3 18600 1346 18600
001 2 270082 5 18720 1123 18600
001 2 270082 6 18720 11372 18600
001 2 270082 4 18720 1570 18600
001 2 270082 8 18780 11373 18600

Upvotes: 1

Views: 51

Answers (1)

Joel Coehoorn
Joel Coehoorn

Reputation: 416179

This is probably better solved with the row_number() windowing function:

SELECT *
FROM (
    SELECT DISTINCT r.ROUTE_ABBR, s.ROUTE_DIRECTION_ID, s.PATTERN_ID, s.BLOCK_STOP_ORDER,
           s.SCHEDULED_TIME, g.GEO_NODE_ABBR, t.TRIP_SEQUENCE AS TPST,
           row_number() over (order by SCHEDULE.BLOCK_STOP_ORDER) rn
    FROM        SCHEDULE s 
     INNER JOIN GEO_NODE g ON s.GEO_NODE_ID = g.GEO_NODE_ID
     INNER JOIN ROUTE    r ON s.ROUTE_ID = r.ROUTE_ID 
     INNER JOIN TRIP     t ON s.TRIP_ID = t.TRIP_ID 

    WHERE     s.CALENDAR_ID = '120221024' AND r.ROUTE_ABBR = '001' 
        AND   s.ROUTE_DIRECTION_ID = '2'  AND s.PATTERN_ID = '270082'
        AND   t.TRIP_SEQUENCE = '18600' 
) t1
WHERE rn=1
ORDER BY t1.ROUTE_ABBR, t1.ROUTE_DIRECTION_ID, t1.TRIP_SEQUENCE

The problem with the original is the name SCHEDULE. For the full version of the query, the subquery is matching the name in the nested select with the instance of the table from the outer select. This correlates the results of the inner table with the outer, so only the item from that row of the outer table is eligible.

When you run the inner query by itself, separate from the outer query, there is only the one instance of the table. In that situation the WHERE conditions are matching the table to itself — they are always true — and you just get the smallest value of all the rows: 2.

This is why you should ALWAYS give ALL the tables in your queries an alias, and ONLY reference them by that alias (as I did in my answer). Do this, and the MIN() version can work... but will still be slower and more code than using row_number().

Finally, the use of DISTINCT / GROUP BY with every SELECT column is usually an indicator you don't fully understand the JOIN relationships used in the query, and in at least one case the join conditions are not sufficiently selective. I'd hesitate to move a query like that to production, even if it seems to be working, though I confess most of us have done it at some point anyway.

Upvotes: 3

Related Questions