art
art

Reputation: 13

SQL SUM Query returns duplicate rows in result

I have a table 'order' containing summary order information and a table 'orderItem' with each item of the order.

My issue is: when selecting the Sum of column 'orderQTY' from the 'Order' table, I get an incorrect total if i join the orderItem table.

The first query below gives me the correct total. However, as soon as I add the join to orderItem, the sum result is incorrectly duplicating the 'orderqty' column for each orderitem record.

nb: I know the below doesn't utilise the join and isn't necessary. I've removed the clauses referring to the joined table to simplify the question.

--RETURNS Correct value
select sum(o.orderqty) 
from [order] o 

--RETURNS the sum containing duplicates of o.orderqty
select sum(o.orderqty)
from [order] o  
join OrderItem oi on o.Id = oi.OrderId

-- adding clarification: ----
I am wanting to Sum the column 'orderqty' from table 'order' while joining to orderItem eg:

There would be multiple orderItems for each Order, but I only want to obviously only count the orderqty from the order table once per order.

select sum(o.ordertotal)
from [order] o with(NOLOCK)
join OrderItem oi on o.Id = oi.OrderId
where oi.mycolumn = 1

or would i need to do something like:

select sum(o.ordertotal)
from [order] o with(NOLOCK)
where o.id in (select orderid from orderitem where x = y)

Upvotes: 0

Views: 4577

Answers (4)

philipxy
philipxy

Reputation: 15118

A FROM clause (acts as if it) calculates an intermediate table.

The SELECT clause expression

sum(o.orderqty)

does not mean "the sum of the orderqty column of table o". It means "the sum of the o.orderqty column of the intermediate table generated by the FROM clause".

Here are example tables:

// [order]
Id  orderqty
 1   33
 2   66

// OrderItem
OrderId  ItemId
 2        7
 1        8
 1        9

Here is the second query:

select sum(o.orderqty)
from [order] o  
join OrderItem oi on o.Id = oi.OrderId

First a FROM calculates a "cross product" that has every possible combination of a row from the tables you JOIN. So each row of it has a subrow from each table. Each subrow has column names like its source table column names but with its source table name or alias prefixed with a dot. (If there's no ambiguity you don't need to mention the prefix and dot to refer to a FROM clause table column.)

// [order] o join OrderItem oi
o.Id  o.orderqty  oi.OrderId  oi.ItemId
  1     33           2           7
  1     33           1           8
  1     33           1           9
  2     66           2           7
  2     66           1           8
  2     66           1           9

Then the rows that fail the ONs and WHERE are removed:

// [order] o join OrderItem oi on o.Id = oi.OrderId
o.Id  o.orderqty  oi.OrderId  oi.ItemId
  1     33           1           8
  1     33           1           9
  2     66           2           7

That's the second query's FROM intermediate table that its SELECT uses. Here's the first query's FROM intermediate table that its SELECT uses:

// [order] o
o.Id  o.orderqty
  1     33
  2     66

So the second query's SUM(o.orderqty) is 33+33+66 while the first query's is 33+66. Since you want SUM for a FROM clause table that is [order], you want the first query.

(We can also describe this on a join-by-join basis, which allows us to explain how LEFT JOIN differs from INNER JOIN, CROSS JOIN and "," and how USING and NATURAL make a difference.)

Re update: You have not explained what rows you want back from your query. Maybe you want:

select oi.Id, oi.orderId, o.orderqty, ...
from [order] o  
join OrderItem oi on o.Id = oi.OrderId

or

select ..., (select sum(orderqty) from [order]) as totalqty, ...

Upvotes: 0

Jason Goemaat
Jason Goemaat

Reputation: 29204

Joins work to create a result set that combines data from multiple tables. SQL then operates aggregate functions and the where clause over this combined result set. So imagine you have 2 orders, 1 with a single item and 1 with 2 items:

// Order table
OrderId   OrderQty
1         5
2         3

// OrderItem table
OrderId   ItemId
1         1
2         1
2         2

Your join will give you this result:

OrderId   OrderQty   OrderId   ItemId
1         5          1         1
2         3          2         1
2         3          2         2

So when you SUM(OrderQty), you get 11 instead of 8 which I think you want. Linking to the other table doesn't make sense in this case...

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269503

It returns different results because the join multiplies the number of rows or filters out rows, both of which affect the sum. It is unclear what you really want to do. If you only want the sum of the quantities of orders that have orderlines, then use exists:

select sum(o.orderqty)
from [order] o  
where exists (select 1
              from OrderItem oi 
              where o.Id = oi.OrderId
             );

Upvotes: 4

Ron Smith
Ron Smith

Reputation: 3266

You can sum once per grouping (in this case order.id) using the row_number function:

select sum(orderqty)
from (
    select
        case row_number() over(partition by o.Id order by o.id)
            when 1 then orderqty
            else 0
        end as orderqty
    from [order] o  
        join OrderItem oi on o.Id = oi.OrderId
    ) o

Upvotes: 1

Related Questions