Reputation: 101
I'm battling to wrap my head around producing a single MySQL query that would heed the correct results.
I've got a table that is structured as followed:
workflow_status_history:
id reference status
1 308ffn3oneb Lead Received
2 308ffn3oneb Quoted
3 308ffn3oneb Invoiced
4 853442ec2fc Lead Received
As you can see, the workflow_status_history
table keeps a history of all the statuses of each workflow on our system, rather than replacing or overwriting the previous status with the new status. This helps with in-depth reporting and auditing. A workflow will always have a starting status of Lead Received
.
The problem however is that I need to select the reference
field of each row in the table who's latest or only status is Lead Received
. So in the example above, field number 4
would return, however fields 1
, 2
and 3
would not return because the latest status for that workflow reference is Invoiced
. But if 853442ec2fc
(field number 4
) gets a new status other than Lead Received
, it also should not return the next time the query runs.
My current query is as followed:
SELECT *, MAX(id) FROM workflow_status_history WHERE 'status' = 'Lead Received' GROUP BY reference LIMIT 20
This, of course, doesn't return the desired result because the WHERE
clause ensures that it returns all the rows that have a Lead Received
status, irrespective of it being the latest status or not. So it will always return the first 20 grouped workflow references in the table.
How would I go about producing the correct query to return the desired results?
Thanks for your help.
Upvotes: 1
Views: 158
Reputation: 64496
Although @Martin Schneider answer is correct, Below are 2 other ways to achieve expected output
Using inner join on same table
select a.*
from workflow_status_history a
join (
select reference,max(id) id
from workflow_status_history
group by reference
) b using(reference,id)
where a.status = 'Lead Received';
Using correlated sub query
select a.*
from workflow_status_history a
where a.status = 'Lead Received'
and a.id = (select max(id)
from workflow_status_history
where reference = a.reference)
Upvotes: 1
Reputation: 3268
This is a case for a left join with itself. The idea in this query is: select all references with status 'Lead Received' which do not have a row with the same reference and a higher ID. I assume you only use the id for determining what is the 'newer' status, no timestamp etc.
SELECT
DISTINCT h1.reference
FROM
workflow_status_history h1 LEFT JOIN workflow_status_history h2 ON
h1.reference = h2.reference AND
h1.id < h2.id
WHERE
h1.status = 'Lead Received' AND
h2.id IS NULL
Upvotes: 2