GordonM
GordonM

Reputation: 31740

MySQL: Referencing subquery columns

I'm currently working on some reporting for an eCommerce system that needs to show how long an order has been delayed. I have a log of projected delivery dates for the order and am able to get the initial and last dates, but doing more complex things like comparing the projected dates is proving problematic.

A (simplified version of) my query is as follows:

SELECT orders.order_id,
       orders.date_dispatched AS actual_dispatch_date,
       (
           SELECT projected_date 
           FROM order_delivery_projections
           WHERE order_id = orders.order_id
           ORDER BY order_delivery_projection_id ASC
           LIMIT 1
       ) AS initial_delivery_projection,
       (
           SELECT projected_date 
           FROM order_delivery_projections
           WHERE order_id = orders.order_id
           ORDER BY order_delivery_projection_id DESC
           LIMIT 1
       ) AS final_delivery_projection
FROM orders
-- and some other joins here for additional report data

(Just FYI, I'm sorting on IDs rather than dates here because dispatch projections can move forward as well as back, for example if a stock shipment comes in ahead of schedule).

This is fine for extracting some raw data about the projected delivery history for an order, but I want to do some additional analysis of the data, for example how many days difference there is between the initial projected shipping date, the final projected shipping date and the actual shipping date. This is where I run into trouble.

I tried adding a DATEDIFF(final_delivery_projection, initial_delivery_projection) column to my SELECT clause in order to see how many days back a given delivery prediction has slipped, but MySQL wouldn't have it.

SELECT orders.order_id,
       orders.date_dispatched AS actual_dispatch_date,
       (
           SELECT projected_date 
           FROM order_delivery_projections
           WHERE order_id = orders.order_id
           ORDER BY order_delivery_projection_id ASC
           LIMIT 1
       ) AS initial_delivery_projection,
       (
           SELECT projected_date 
           FROM order_delivery_projections
           WHERE order_id = orders.order_id
           ORDER BY order_delivery_projection_id DESC
           LIMIT 1
       ) AS final_delivery_projection,
       DATEDIFF(final_delivery_projection - initial_delivery_projection) AS projection_days_revised
FROM orders
-- and some other joins here for additional report data

Unknown column final_delivery_projection in field list

Presumably you can't use an alias in a select statement if the alias is referring to a subselect in the same statement.

I would also like to be able to exclude orders in the WHERE clause based on the results of the projections. For example, I'd like to exclude all orders where the final projected shipping date falls before the initial projected date on the grounds that I'm only interested in orders that are being held up rather than ones that have shipped ahead of schedule.

Is extracting and processing the data I'm trying to get in the same statement possible, or will I have to do some post-processing in the client to work things like this out? If it is possible in SQL, then how can it be done?

Upvotes: 1

Views: 389

Answers (3)

GordonM
GordonM

Reputation: 31740

I spent a lot of time researching this issue, and the problem is a fundamental issue with MySQL that means you can't refer to an outer query in an inner query if the inner query is in the FROM portion of the query, even though you can refer to the outer query if the inner query is in the SELECT portion.

As for getting the first and last records from the delivery date history, I've discovered this query which produces the correct results with acceptable performance.

SELECT odr.order_id,
    first_change.projected_date AS initial_projected_dispatch_date,
    last_change.projected_date AS final_projected_dispatch_date
FROM order AS odr
LEFT JOIN (
    SELECT 
        order_id, 
        MIN(order_delivery_projection_id) AS first_id, 
        MAX(order_delivery_projection_id) AS last_id
    FROM order_delivery_projections
    GROUP BY order_id
) AS change_record_finder ON change_record_finder.order_id = odr.order_id
LEFT JOIN order_delivery_projections AS first_change ON first_change.order_delivery_projection_id = change_record_finder.first_id
LEFT JOIN order_delivery_projections AS last_change ON last_change.order_delivery_projection_id = change_record_finder.last_id
WHERE -- where clauses go here

Matt's solution would have worked in other SQL flavours, but MySQL lacks the required functionality to allow his otherwise correct answer to work.

Upvotes: 0

Matt
Matt

Reputation: 1757

Having a look around you need to used derived tables in order to access the alias name in the select query. see here from more info

try this query:

SELECT orders.order_id,
       orders.date_dispatched AS actual_dispatch_date,
       initial_delivery_projection,
       final_delivery_projection,
       DATEDIFF(final_delivery_projection - initial_delivery_projection) AS projection_days_revised
FROM (
    SELECT orders.order_id,
       orders.date_dispatched AS actual_dispatch_date,
       (
           SELECT projected_date 
           FROM order_delivery_projections
           WHERE order_id = orders.order_id
           ORDER BY order_delivery_projection_id ASC
           LIMIT 1
       ) AS initial_delivery_projection,
       (
           SELECT projected_date 
           FROM order_delivery_projections
           WHERE order_id = orders.order_id
           ORDER BY order_delivery_projection_id DESC
           LIMIT 1
       ) AS final_delivery_projection
   FROM orders) z

Upvotes: 0

xQbert
xQbert

Reputation: 35333

I believe the values for the select are generated at the same time. So date diff may not be able to be computed since the aliased value may not be known at the time. If you refactor the Select, you may be able to get the results you're looking for. Additionally this would give you access to additional table values which you may want for additional analysis on sub queries/data sets.

SELECT orders.order_id,
       orders.date_dispatched AS actual_dispatch_date,
       First.Projected_Date AS initial_delivery_projection,
       Current.Projected_Data AS final_delivery_projection,
       dateDiff(Current.Projected_date,First.Projected_date)
FROM orders O
LEFT JOIN (SELECT projected_date, Order_ID 
           FROM order_delivery_projections
           ORDER BY order_delivery_projection_id ASC
           LIMIT 1) First
    ON First.order_id = O.order_id
LEFT JOIN (SELECT projected_date, Order_ID
           FROM order_delivery_projections
           ORDER BY order_delivery_projection_id DESC
           LIMIT 1)
    ON Current.Order_ID = O.Order_ID

Is it possible that an Order_ID in Orders may not exist in order_Delivery_projections? If so then date diff on a null value may cause errors as well...

Upvotes: 1

Related Questions