RookieSA
RookieSA

Reputation: 101

MySQL Select Latest Row of Specific Value

I'm battling to wrap my head around producing a single MySQL query that would heed the correct results.

I've got a table that is structured as followed:

workflow_status_history:

id    reference         status
 1    308ffn3oneb    Lead Received
 2    308ffn3oneb    Quoted
 3    308ffn3oneb    Invoiced
 4    853442ec2fc    Lead Received

As you can see, the workflow_status_history table keeps a history of all the statuses of each workflow on our system, rather than replacing or overwriting the previous status with the new status. This helps with in-depth reporting and auditing. A workflow will always have a starting status of Lead Received.

The problem however is that I need to select the reference field of each row in the table who's latest or only status is Lead Received. So in the example above, field number 4 would return, however fields 1, 2 and 3 would not return because the latest status for that workflow reference is Invoiced. But if 853442ec2fc (field number 4) gets a new status other than Lead Received, it also should not return the next time the query runs.

My current query is as followed:

SELECT *, MAX(id) FROM workflow_status_history WHERE 'status' = 'Lead Received' GROUP BY reference LIMIT 20

This, of course, doesn't return the desired result because the WHERE clause ensures that it returns all the rows that have a Lead Received status, irrespective of it being the latest status or not. So it will always return the first 20 grouped workflow references in the table.

How would I go about producing the correct query to return the desired results?

Thanks for your help.

Upvotes: 1

Views: 158

Answers (2)

M Khalid Junaid
M Khalid Junaid

Reputation: 64496

Although @Martin Schneider answer is correct, Below are 2 other ways to achieve expected output

Using inner join on same table

select a.*
from workflow_status_history a
join (
  select reference,max(id) id
  from workflow_status_history
  group by reference
) b using(reference,id)
where a.status = 'Lead Received';

Using correlated sub query

select a.*
from workflow_status_history a
where a.status = 'Lead Received'
and a.id = (select max(id)
            from workflow_status_history
            where reference = a.reference)

DEMO

Upvotes: 1

Martin Schneider
Martin Schneider

Reputation: 3268

This is a case for a left join with itself. The idea in this query is: select all references with status 'Lead Received' which do not have a row with the same reference and a higher ID. I assume you only use the id for determining what is the 'newer' status, no timestamp etc.

SELECT
  DISTINCT h1.reference
FROM
  workflow_status_history h1 LEFT JOIN workflow_status_history h2 ON 
                  h1.reference = h2.reference AND 
                  h1.id < h2.id

WHERE
  h1.status = 'Lead Received' AND
  h2.id IS NULL

Upvotes: 2

Related Questions