Reputation: 1

Find latest status using 2 tables in SQL

I have the following 2 tables:

orders  
id | user | amount |  
1 | ALEX | 10 |  
2 | BARB | 20 |  
3 | CARL | 30 |  
4 | DAVE | 40 |  
5 | EVIE | 50 |  
6 | FRAN | 60 | 

history  
order_id | status | date |  
1 | pending |  2017-04-01 10:02:47 |  
1 | shipped |  2017-05-01 05:58:35 |   
1 | delivered |  2017-06-01 12:56:32 |    
2 | pending |  2017-04-01 11:44:03 |  
2 | shipped |  2017-05-01 14:51:49 |     
2 | delivered |  2017-06-01 23:27:16 |    
3 | pending |  2017-07-01 20:44:02 |  
3 | shipped |  2017-08-01 17:10:18 |  
4 | pending | 2017-07-01 11:19:15 |  
4 | shipped | 2017-08-01 15:17:25 |  
5 | pending | 2017-10-01 21:59:13 |   
6 | pending | 2017-10-01 02:47:33 |

These are the important fields, as each table has a couple more columns that are not needed nor useful. What I need is a query that will show me only the latest "status" and "date" for each "order_id", meaning there should only be 1 line per "order_id". End result should look like so:

order_id | amount | user | status | date |     
1 | 10 | ALEX | delivered | 2017-06-01 12:56:32 |     
2 | 20 | BARB | delivered | 2017-06-01 23:27:16 |     
3 | 30 | CARL | shipped | 2017-08-01 17:10:18 |  
4 | 40 | DAVE | shipped | 2017-08-01 15:17:25 |  
5 | 50 | EVIE | pending | 2017-10-01 21:59:13 |  
6 | 60 | FRAN | pending | 2017-10-01 02:47:33 |

This is as far as I could go, which shows everything pretty much correctly but only for the delivered ones

SELECT orders.id, orders.user, orders.amount, history.status, history.date  
FROM orders  
JOIN history  
ON history.order_id = orders.id  
AND history.status = 'delivered'  
ORDER BY history.date DESC

As far as the DBMS I'm using, I'm not entirely sure, I'm using the SQL editor on the Mode Analytics website. For some other issues I have used solutions that were apparently only valid for use in MySQL, but in some others I used some that were only valid for use with PostgreSQL.

Upvotes: 0

Answers (3)

Moaaz K'

Reputation: 56

As you are not sure of the RDBMS you are using, I'll be writing ANSI compliant SQL queries that will run on any ANSI compliant SQL RDBMS.

This is a very common question in SQL: to find the whole row having maximum value. It is, actually, so common that StackOverflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple `group-identifier, max-value-in-group` Sub-Query

In this approach, you first find the group-identifier, max-value-in-group in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT
    A.id order_id
    A.amount,
    A.user,
    B.status,
    B.date
FROM orders A
INNER JOIN history B
    ON A.id = B.order_id
INNER JOIN
(
    SELECT
        order_id,
        MAX(date) date
    FROM history
    GROUP BY
        order_id
) C
    ON B.order_id = C.order_id
    AND B.date = C.date

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

The second join condition is having left side value less than right value
When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT
    A.id order_id
    A.amount,
    A.user,
    B.status,
    B.date
FROM orders A
INNER JOIN history B
    ON A.id = B.order_id
LEFT OUTER JOIN history C
    ON B.order_id = C.order_id
    AND B.date < C.date
WHERE
    C.order_id IS NULL

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.

Upvotes: 1

Hesam Akbari

Reputation: 1141

You can use Window Function as Below:

select 
 h.order_id
,o.amount
,o.user
,h.status
,h.date
from orders o 
left join
     (select *,
             row_number() over (partition by order_id order by date desc) as rowNum
      from history 
     ) h
     on h.order_id = o.id and rowNum= 1;

Upvotes: 0

Gordon Linoff

Reputation: 1269763

One method uses row_number() to identify the most recent row:

select o.*, h.*
from orders o left join
     (select h.*,
             row_number() over (partition by order_id order by date) as seqnum
      from history h
     ) h
     on h.order_id = o.id and seqnum = 1;

Upvotes: 0

Find latest status using 2 tables in SQL

Answers (3)

Joining with simple group-identifier, max-value-in-group Sub-Query

Left Joining with self, tweaking join conditions and filters

Conclusion

Related Questions

Joining with simple `group-identifier, max-value-in-group` Sub-Query