Reputation: 13
I have a table 'order' containing summary order information and a table 'orderItem' with each item of the order.
My issue is: when selecting the Sum of column 'orderQTY' from the 'Order' table, I get an incorrect total if i join the orderItem table.
The first query below gives me the correct total. However, as soon as I add the join to orderItem, the sum result is incorrectly duplicating the 'orderqty' column for each orderitem record.
nb: I know the below doesn't utilise the join and isn't necessary. I've removed the clauses referring to the joined table to simplify the question.
--RETURNS Correct value
select sum(o.orderqty)
from [order] o
--RETURNS the sum containing duplicates of o.orderqty
select sum(o.orderqty)
from [order] o
join OrderItem oi on o.Id = oi.OrderId
-- adding clarification: ----
I am wanting to Sum the column 'orderqty' from table 'order' while joining to orderItem eg:
There would be multiple orderItems for each Order, but I only want to obviously only count the orderqty from the order table once per order.
select sum(o.ordertotal)
from [order] o with(NOLOCK)
join OrderItem oi on o.Id = oi.OrderId
where oi.mycolumn = 1
or would i need to do something like:
select sum(o.ordertotal)
from [order] o with(NOLOCK)
where o.id in (select orderid from orderitem where x = y)
Upvotes: 0
Views: 4577
Reputation: 15118
A FROM clause (acts as if it) calculates an intermediate table.
The SELECT clause expression
sum(o.orderqty)
does not mean "the sum of the orderqty
column of table o
". It means "the sum of the o.orderqty
column of the intermediate table generated by the FROM clause".
Here are example tables:
// [order]
Id orderqty
1 33
2 66
// OrderItem
OrderId ItemId
2 7
1 8
1 9
Here is the second query:
select sum(o.orderqty)
from [order] o
join OrderItem oi on o.Id = oi.OrderId
First a FROM calculates a "cross product" that has every possible combination of a row from the tables you JOIN. So each row of it has a subrow from each table. Each subrow has column names like its source table column names but with its source table name or alias prefixed with a dot. (If there's no ambiguity you don't need to mention the prefix and dot to refer to a FROM clause table column.)
// [order] o join OrderItem oi
o.Id o.orderqty oi.OrderId oi.ItemId
1 33 2 7
1 33 1 8
1 33 1 9
2 66 2 7
2 66 1 8
2 66 1 9
Then the rows that fail the ONs and WHERE are removed:
// [order] o join OrderItem oi on o.Id = oi.OrderId
o.Id o.orderqty oi.OrderId oi.ItemId
1 33 1 8
1 33 1 9
2 66 2 7
That's the second query's FROM intermediate table that its SELECT uses. Here's the first query's FROM intermediate table that its SELECT uses:
// [order] o
o.Id o.orderqty
1 33
2 66
So the second query's SUM(o.orderqty) is 33+33+66 while the first query's is 33+66. Since you want SUM for a FROM clause table that is [order], you want the first query.
(We can also describe this on a join-by-join basis, which allows us to explain how LEFT JOIN differs from INNER JOIN, CROSS JOIN and "," and how USING and NATURAL make a difference.)
Re update: You have not explained what rows you want back from your query. Maybe you want:
select oi.Id, oi.orderId, o.orderqty, ...
from [order] o
join OrderItem oi on o.Id = oi.OrderId
or
select ..., (select sum(orderqty) from [order]) as totalqty, ...
Upvotes: 0
Reputation: 29204
Joins work to create a result set that combines data from multiple tables. SQL then operates aggregate functions and the where clause over this combined result set. So imagine you have 2 orders, 1 with a single item and 1 with 2 items:
// Order table
OrderId OrderQty
1 5
2 3
// OrderItem table
OrderId ItemId
1 1
2 1
2 2
Your join will give you this result:
OrderId OrderQty OrderId ItemId
1 5 1 1
2 3 2 1
2 3 2 2
So when you SUM(OrderQty)
, you get 11 instead of 8 which I think you want. Linking to the other table doesn't make sense in this case...
Upvotes: 0
Reputation: 1269503
It returns different results because the join
multiplies the number of rows or filters out rows, both of which affect the sum. It is unclear what you really want to do. If you only want the sum of the quantities of orders that have orderlines, then use exists:
select sum(o.orderqty)
from [order] o
where exists (select 1
from OrderItem oi
where o.Id = oi.OrderId
);
Upvotes: 4
Reputation: 3266
You can sum once per grouping (in this case order.id) using the row_number function:
select sum(orderqty)
from (
select
case row_number() over(partition by o.Id order by o.id)
when 1 then orderqty
else 0
end as orderqty
from [order] o
join OrderItem oi on o.Id = oi.OrderId
) o
Upvotes: 1