Rick Sarvas
Rick Sarvas

Reputation: 769

SQL select with inner join, sub select and limit

I've been working with this SQL problem for about 2 days now and suspect I'm very close to resolving the issue but just can't seem to find a solution that completely works.

What I'm attempting to do is a selective join on two tables called application_info and application_status that are used to store information about open access journal article funding requests.

application_info has general information about the applicant and uses an auto indexing field called Application_ID as a key field. application_status is used to track the ongoing information about the status of the application (received, under review, funded, denied, withdrawn, etc.) as well as status of the journal article (submitted, accepted, resubmitted, published or rejected) and contains both an Application_ID field and an auto indexing field called Status_ID along with a status text and status date field.

Because we want to keep a running log of application, article, and funding status changes we don't want to overwrite existing rows in the application_status with updated values, but instead want to only show the most recent status values. Because an application will eventually have more than one status change this creates a need to apply some sort of limit on the inner join of the status data to the application data so that only one row is returned for each application ID.

Here's an example of what I am attempting to do in a query that currently throws an error:

-- simplified example
SELECT 
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
    SELECT 
    Status_ID,
    Application_ID,
    Status_State_Text,
    Status_State_Date,
    Status_State_InitiatedBy,
    Status_State_ChangebBy,
    Status_State_Notes
    FROM application_status 
    WHERE Status_State_Text LIKE 'Article Status%'
    AND Application_ID = application_info.Application_ID -- how to pass the current application_info.Application_ID from the ON clause to here?
    -- and Application_ID = 29 -- this would be an option for specific IDs, but not an option for getting a complete list of application IDs with status
    -- GROUP BY Application_ID -- reduces the sub query to 1 row (Yeah!) but returns the first row encountered before the ORDER BY comes into play
    ORDER BY Status_ID DESC
    -- a GROUP BY after the ORDER BY might resolve the issue if we could do a sort first
    LIMIT 1 -- only want to get the first (most recent) row, only works correctly if passing an Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- need to get all IDs with statu values as well as for specific ID requests
;

Eliminating the AND Application_ID = application_info.Application_ID and portion of the sub query along with the LIMIT causes the select to work, but returns a row for every status for a given application ID. I've tried messing with using MIN/MAX operators but have noticed that they return unpredictable rows from the application_status table when they work.

I've also attempted to do sub selects in the ON section of the join, but don't know how to make that work because the end result would always need to return an Application_ID (can both Application_ID and Status_ID be returned and used?).

Any hints on how to get this to work as I'm intending? Can this even be done?

Further edit: working query below. The key was to move the sub query in the join one level deeper and then return just a single status ID.

-- simplified example (now working)
SELECT 
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
    SELECT 
    Status_ID,
    Application_ID,
    Status_State_Text,
    Status_State_Date,
    Status_State_InitiatedBy,
    Status_State_ChangebBy,
    Status_State_Notes
    FROM application_status AS artstatus_int
    WHERE 
    -- sub query moved one level deeper so current join Application_ID can be passed
    -- order by and limit can now be used
    Status_ID = (
        SELECT status_ID FROM application_status WHERE Application_ID = artstatus_int.Application_ID
        AND status_State_Text LIKE 'Article Status%'
        ORDER BY Status_ID DESC
        LIMIT 1
    )
    ORDER BY Application_ID, Status_ID DESC
    -- no need for GROUP BY or LIMIT here because only one row is returned per Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- works for specific application ID as well

-- more LEFT JOINS follow
;

Upvotes: 2

Views: 6159

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269803

You can't have a correlated subquery in the from clause.

Try this idea instead:

select <whatever>
from (select a.*,
             (select max(status_id) as maxstatusid
              from application_status aps
              where aps.application_id = a.application_id
             ) as maxstatusid
      from application
     ) left outer join
     application_status aps
     on aps.status_id = a.maxstatusid
. . .

That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.

You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.

Upvotes: 2

Related Questions