Reputation: 1
I have the following 2 tables:
orders
id | user | amount |
1 | ALEX | 10 |
2 | BARB | 20 |
3 | CARL | 30 |
4 | DAVE | 40 |
5 | EVIE | 50 |
6 | FRAN | 60 |
history
order_id | status | date |
1 | pending | 2017-04-01 10:02:47 |
1 | shipped | 2017-05-01 05:58:35 |
1 | delivered | 2017-06-01 12:56:32 |
2 | pending | 2017-04-01 11:44:03 |
2 | shipped | 2017-05-01 14:51:49 |
2 | delivered | 2017-06-01 23:27:16 |
3 | pending | 2017-07-01 20:44:02 |
3 | shipped | 2017-08-01 17:10:18 |
4 | pending | 2017-07-01 11:19:15 |
4 | shipped | 2017-08-01 15:17:25 |
5 | pending | 2017-10-01 21:59:13 |
6 | pending | 2017-10-01 02:47:33 |
These are the important fields, as each table has a couple more columns that are not needed nor useful. What I need is a query that will show me only the latest "status" and "date" for each "order_id", meaning there should only be 1 line per "order_id". End result should look like so:
order_id | amount | user | status | date |
1 | 10 | ALEX | delivered | 2017-06-01 12:56:32 |
2 | 20 | BARB | delivered | 2017-06-01 23:27:16 |
3 | 30 | CARL | shipped | 2017-08-01 17:10:18 |
4 | 40 | DAVE | shipped | 2017-08-01 15:17:25 |
5 | 50 | EVIE | pending | 2017-10-01 21:59:13 |
6 | 60 | FRAN | pending | 2017-10-01 02:47:33 |
This is as far as I could go, which shows everything pretty much correctly but only for the delivered ones
SELECT orders.id, orders.user, orders.amount, history.status, history.date
FROM orders
JOIN history
ON history.order_id = orders.id
AND history.status = 'delivered'
ORDER BY history.date DESC
As far as the DBMS I'm using, I'm not entirely sure, I'm using the SQL editor on the Mode Analytics website. For some other issues I have used solutions that were apparently only valid for use in MySQL, but in some others I used some that were only valid for use with PostgreSQL.
Upvotes: 0
Views: 163
Reputation: 56
As you are not sure of the RDBMS you are using, I'll be writing ANSI compliant SQL queries that will run on any ANSI compliant SQL RDBMS.
This is a very common question in SQL: to find the whole row having maximum value. It is, actually, so common that StackOverflow community has created a single tag just to deal with questions like that: greatest-n-per-group.
Basically, you have two approaches to solve that problem:
group-identifier, max-value-in-group
Sub-QueryIn this approach, you first find the group-identifier, max-value-in-group
in a sub-query. Then you join your table to the sub-query with equality on both group-identifier
and max-value-in-group
:
SELECT
A.id order_id
A.amount,
A.user,
B.status,
B.date
FROM orders A
INNER JOIN history B
ON A.id = B.order_id
INNER JOIN
(
SELECT
order_id,
MAX(date) date
FROM history
GROUP BY
order_id
) C
ON B.order_id = C.order_id
AND B.date = C.date
In this approach, you left join the table with itself. Equality goes in the group-identifier
. Then, 2 smart moves:
NULL
in the right side (it's a LEFT JOIN
, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL
.So you end up with:
SELECT
A.id order_id
A.amount,
A.user,
B.status,
B.date
FROM orders A
INNER JOIN history B
ON A.id = B.order_id
LEFT OUTER JOIN history C
ON B.order_id = C.order_id
AND B.date < C.date
WHERE
C.order_id IS NULL
Both approaches bring the exact same result.
If you have two rows with max-value-in-group
for group-identifier
, both rows will be in the result in both approaches.
Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".
Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.
Upvotes: 1
Reputation: 1141
You can use Window Function as Below:
select
h.order_id
,o.amount
,o.user
,h.status
,h.date
from orders o
left join
(select *,
row_number() over (partition by order_id order by date desc) as rowNum
from history
) h
on h.order_id = o.id and rowNum= 1;
Upvotes: 0
Reputation: 1269763
One method uses row_number()
to identify the most recent row:
select o.*, h.*
from orders o left join
(select h.*,
row_number() over (partition by order_id order by date) as seqnum
from history h
) h
on h.order_id = o.id and seqnum = 1;
Upvotes: 0