Reputation: 769
I've been working with this SQL problem for about 2 days now and suspect I'm very close to resolving the issue but just can't seem to find a solution that completely works.
What I'm attempting to do is a selective join on two tables called application_info
and application_status
that are used to store information about open access journal article funding requests.
application_info
has general information about the applicant and uses an auto indexing field called Application_ID
as a key field. application_status
is used to track the ongoing information about the status of the application (received, under review, funded, denied, withdrawn, etc.) as well as status of the journal article (submitted, accepted, resubmitted, published or rejected) and contains both an Application_ID
field and an auto indexing field called Status_ID
along with a status text and status date field.
Because we want to keep a running log of application, article, and funding status changes we don't want to overwrite existing rows in the application_status
with updated values, but instead want to only show the most recent status values. Because an application will eventually have more than one status change this creates a need to apply some sort of limit on the inner join of the status data to the application data so that only one row is returned for each application ID.
Here's an example of what I am attempting to do in a query that currently throws an error:
-- simplified example
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status
WHERE Status_State_Text LIKE 'Article Status%'
AND Application_ID = application_info.Application_ID -- how to pass the current application_info.Application_ID from the ON clause to here?
-- and Application_ID = 29 -- this would be an option for specific IDs, but not an option for getting a complete list of application IDs with status
-- GROUP BY Application_ID -- reduces the sub query to 1 row (Yeah!) but returns the first row encountered before the ORDER BY comes into play
ORDER BY Status_ID DESC
-- a GROUP BY after the ORDER BY might resolve the issue if we could do a sort first
LIMIT 1 -- only want to get the first (most recent) row, only works correctly if passing an Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- need to get all IDs with statu values as well as for specific ID requests
;
Eliminating the AND Application_ID = application_info.Application_ID
and portion of the sub query along with the LIMIT
causes the select to work, but returns a row for every status for a given application ID. I've tried messing with using MIN
/MAX
operators but have noticed that they return unpredictable rows from the application_status
table when they work.
I've also attempted to do sub selects in the ON
section of the join, but don't know how to make that work because the end result would always need to return an Application_ID
(can both Application_ID
and Status_ID
be returned and used?).
Any hints on how to get this to work as I'm intending? Can this even be done?
Further edit: working query below. The key was to move the sub query in the join one level deeper and then return just a single status ID.
-- simplified example (now working)
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status AS artstatus_int
WHERE
-- sub query moved one level deeper so current join Application_ID can be passed
-- order by and limit can now be used
Status_ID = (
SELECT status_ID FROM application_status WHERE Application_ID = artstatus_int.Application_ID
AND status_State_Text LIKE 'Article Status%'
ORDER BY Status_ID DESC
LIMIT 1
)
ORDER BY Application_ID, Status_ID DESC
-- no need for GROUP BY or LIMIT here because only one row is returned per Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- works for specific application ID as well
-- more LEFT JOINS follow
;
Upvotes: 2
Views: 6159
Reputation: 1269803
You can't have a correlated subquery in the from clause.
Try this idea instead:
select <whatever>
from (select a.*,
(select max(status_id) as maxstatusid
from application_status aps
where aps.application_id = a.application_id
) as maxstatusid
from application
) left outer join
application_status aps
on aps.status_id = a.maxstatusid
. . .
That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.
You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.
Upvotes: 2