Reputation: 256571
I'll use a concrete, but hypothetical, example.
Each Order normally has only one line item:
Orders:
OrderGUID OrderNumber
========= ============
{FFB2...} STL-7442-1
{3EC6...} MPT-9931-8A
LineItems:
LineItemGUID Order ID Quantity Description
============ ======== ======== =================================
{098FBE3...} 1 7 prefabulated amulite
{1609B09...} 2 32 spurving bearing
But occasionally there will be an order with two line items:
LineItemID Order ID Quantity Description
========== ======== ======== =================================
{A58A1...} 6,784,329 5 pentametric fan
{0E9BC...} 6,784,329 5 differential girdlespring
Normally when showing the orders to the user:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN LineItems
ON Orders.OrderID = LineItems.OrderID
I want to show the single item on the order. But with this occasional order containing two (or more) items, the orders would appear be duplicated:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 spurving bearing
KSG-0619-81 5 panametric fan
KSG-0619-81 5 differential girdlespring
What I really want is to have SQL Server just pick one, as it will be good enough:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan
If I get adventurous, I might show the user, an ellipsis to indicate that there's more than one:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan, ...
So the question is how to either
My first naive attempt was to only join to the "TOP 1" line items:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN (
SELECT TOP 1 LineItems.Quantity, LineItems.Description
FROM LineItems
WHERE LineItems.OrderID = Orders.OrderID) LineItems2
ON 1=1
But that gives the error:
The column or prefix 'Orders' does not
match with a table name or alias name
used in the query.
Presumably because the inner select doesn't see the outer table.
Upvotes: 958
Views: 886954
Reputation: 11
SELECT
Orders.OrderNumber,
LineItems.Quantity,
LineItems.Description
FROM
Orders
JOIN LineItems
ON LineItems.LineItemGUID =
(
SELECT TOP 1 LineItemGUID
FROM LineItems
WHERE OrderID = Orders.OrderID
)
Upvotes: 1
Reputation: 425221
SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM Orders
CROSS APPLY
(
SELECT TOP 1 LineItems.Quantity, LineItems.Description
FROM LineItems
WHERE LineItems.OrderID = Orders.OrderID
) LineItems2
For SQL Server versions prior to 2005, you need to use an INNER JOIN
instead of a CROSS APPLY
:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
JOIN LineItems
ON LineItems.LineItemGUID =
(
SELECT TOP 1 LineItemGUID
FROM LineItems
WHERE OrderID = Orders.OrderID
)
Please note that TOP 1
without ORDER BY
is not deterministic: this query you will get you one line item per order, but it is not defined which one will it be.
Multiple invocations of the query can give you different line items for the same order, even if the underlying did not change.
If you want deterministic order, you should add an ORDER BY
clause to the innermost query.
Upvotes: 1507
Reputation: 314
From SQL Server 2012 and onwards I think this will do the trick:
SELECT DISTINCT
o.OrderNumber ,
FIRST_VALUE(li.Quantity) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Quantity ,
FIRST_VALUE(li.Description) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Description
FROM Orders AS o
INNER JOIN LineItems AS li ON o.OrderID = li.OrderID
Upvotes: 28
Reputation: 1753
,Another aproach using common table expression:
with firstOnly as (
select Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() over (partiton by Orders.OrderID order by Orders.OrderID) lp
FROM Orders
join LineItems on Orders.OrderID = LineItems.OrderID
) select *
from firstOnly
where lp = 1
or, in the end maybe you would like to show all rows joined?
comma separated version here:
select *
from Orders o
cross apply (
select CAST((select l.Description + ','
from LineItems l
where l.OrderID = s.OrderID
for xml path('')) as nvarchar(max)) l
) lines
Upvotes: 16
Reputation: 1123
My favorite way to run this query is with a not exists clause. I believe this is the most efficient way to run this sort of query:
select o.OrderNumber,
li.Quantity,
li.Description
from Orders as o
inner join LineItems as li
on li.OrderID = o.OrderID
where not exists (
select 1
from LineItems as li_later
where li_later.OrderID = o.OrderID
and li_later.LineItemGUID > li.LineItemGUID
)
But I have not tested this method against other methods suggested here.
Upvotes: 5
Reputation: 409
Correlated sub queries are sub queries that depend on the outer query. It’s like a for loop in SQL. The sub-query will run once for each row in the outer query:
select * from users join widgets on widgets.id = (
select id from widgets
where widgets.user_id = users.id
order by created_at desc
limit 1
)
Upvotes: 11
Reputation: 10203
@Quassnoi answer is good, in some cases (especially if the outer table is big), a more efficient query might be with using windowed functions, like this:
SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM Orders
LEFT JOIN
(
SELECT LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER()
OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum
FROM LineItems
) LineItems2 ON LineItems2.OrderId = Orders.OrderID And RowNum = 1
Sometimes you just need to test which query gives better performance.
Upvotes: 68
Reputation: 71
Tried the cross, works nicely, but takes slightly longer. Adjusted line columns to have max and added group which kept speed and dropped the extra record.
Here's the adjusted query:
SELECT Orders.OrderNumber, max(LineItems.Quantity), max(LineItems.Description)
FROM Orders
INNER JOIN LineItems
ON Orders.OrderID = LineItems.OrderID
Group by Orders.OrderNumber
Upvotes: 3
Reputation: 1903
I know this question was answered a while ago, but when dealing with large data sets, nested queries can be costly. Here is a different solution where the nested query will only be ran once, instead of for each row returned.
SELECT
Orders.OrderNumber,
LineItems.Quantity,
LineItems.Description
FROM
Orders
INNER JOIN (
SELECT
Orders.OrderNumber,
Max(LineItem.LineItemID) AS LineItemID
FROM
Orders INNER JOIN LineItems
ON Orders.OrderNumber = LineItems.OrderNumber
GROUP BY Orders.OrderNumber
) AS Items ON Orders.OrderNumber = Items.OrderNumber
INNER JOIN LineItems
ON Items.LineItemID = LineItems.LineItemID
Upvotes: 152
Reputation: 11007
EDIT: nevermind, Quassnoi has a better answer.
For SQL2K, something like this:
SELECT
Orders.OrderNumber
, LineItems.Quantity
, LineItems.Description
FROM (
SELECT
Orders.OrderID
, Orders.OrderNumber
, FirstLineItemID = (
SELECT TOP 1 LineItemID
FROM LineItems
WHERE LineItems.OrderID = Orders.OrderID
ORDER BY LineItemID -- or whatever else
)
FROM Orders
) Orders
JOIN LineItems
ON LineItems.OrderID = Orders.OrderID
AND LineItems.LineItemID = Orders.FirstLineItemID
Upvotes: 4
Reputation: 338078
You could do:
SELECT
Orders.OrderNumber,
LineItems.Quantity,
LineItems.Description
FROM
Orders INNER JOIN LineItems
ON Orders.OrderID = LineItems.OrderID
WHERE
LineItems.LineItemID = (
SELECT MIN(LineItemID)
FROM LineItems
WHERE OrderID = Orders.OrderID
)
This requires an index (or primary key) on LineItems.LineItemID
and an index on LineItems.OrderID
or it will be slow.
Upvotes: 30