kurt wang
kurt wang

Reputation: 5

How can I avoid sum multiple times while using join

I am now using the mssql with its sample database "adventureworks 2014", here I faced some problems with join and sum, here is the two table I used:

PurchaseOrderHeader:
PurchaseOrderID  VendorID   OrderDate                TotalDue
1                1580       2011-04-16 00:00:00.000  222.1492
2                1496       2011-04-16 00:00:00.000  300.6721
3                1494       2011-04-16 00:00:00.000  9776.2665
4                1650       2011-04-16 00:00:00.000  189.0395
5                1654       2011-04-30 00:00:00.000  22539.0165
6                1664       2011-04-30 00:00:00.000  16164.0229
7                1678       2011-04-30 00:00:00.000  64847.5328

PurchaseOrderDetail:
PurchaseOrderID  PurchaseOrderDetailID  OrderQty    ProductID
1                1                      4           1
2                2                      3           359
2                3                      3           360
3                4                      550         530
4                5                      3           4
5                6                      550         512
6                7                      550         513
7                8                      550         317
7                9                      550         318
7                10                     550         319

Here is the sql script:

    CREATE TABLE PurchaseOrderHeader(
   PurchaseOrderID INTEGER  NOT NULL PRIMARY KEY 
  ,VendorID        INTEGER  NOT NULL
  ,OrderDate       VARCHAR(23) NOT NULL
  ,TotalDue        NUMERIC(10,4) NOT NULL
);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (1,1580,'2011-04-16 00:00:00.000',222.1492);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (2,1496,'2011-04-16 00:00:00.000',300.6721);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (3,1494,'2011-04-16 00:00:00.000',9776.2665);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (4,1650,'2011-04-16 00:00:00.000',189.0395);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (5,1654,'2011-04-30 00:00:00.000',22539.0165);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (6,1664,'2011-04-30 00:00:00.000',16164.0229);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (7,1678,'2011-04-30 00:00:00.000',64847.5328);


       CREATE TABLE PurchaseOrderDetail(
   PurchaseOrderID       INTEGER  NOT NULL
  ,PurchaseOrderDetailID INTEGER  NOT NULL PRIMARY KEY 
  ,OrderQty              INTEGER  NOT NULL
  ,ProductID             INTEGER  NOT NULL
);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (1,1,4,1);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (2,2,3,359);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (2,3,3,360);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (3,4,550,530);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (4,5,3,4);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (5,6,550,512);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (6,7,550,513);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,8,550,317);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,9,550,318);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,10,550,319);

and here is my code:

select PurchaseOrderHeader.VendorID,
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderHeader.TotalDue else 0 END) as "TotalPay IN 2011", 
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderDetail.OrderQty else 0 END) as "TotalOrder IN 2011"
from PurchaseOrderHeader
left join PurchaseOrderDetail on PurchaseOrderHeader.PurchaseOrderID = PurchaseOrderDetail.PurchaseOrderID
group by PurchaseOrderHeader.VendorID
order by VendorID

Here is what I got:

VendorID    TotalPay IN 2011    TotalOrder IN 2011
1494        9776.2665           550
1496        601.3442            6
1580        222.1492            4
1650        189.0395            3
1654        22539.0165          550
1664        16164.0229          550
1678        194542.5984         1650

while I should expect:

VendorID    TotalPay IN 2011    TotalOrder IN 2011
1494        9776.2665           550
1496        300.6721            6
1580        222.1492            4
1650        189.0395            3
1654        22539.0165          550
1664        16164.0229          550
1678        64847.5328          1650

This code will join two tables on PurchaseOrderID, and calculate the TotalDue grouped by vendorID. The problem is when I use join, where will be multiple rows from table PurchaseOrderDetail refered to one row in table PurchaseOrderHeader. In this example for vendor 1496 and 1678 there are two or three rows refer to one row in PurchaseDetailHeader. So it will be added two or three times. How should I avoid adding multiple times, thanks!

Upvotes: 0

Views: 1262

Answers (4)

Philip Kelley
Philip Kelley

Reputation: 40359

Lots of good answers, but I think they miss the bit where a vendor could have multiple purchase orders, and that throws off how the TotalOrder gets calculated. (Try a sample with multiple vendors with multiple orders with each order having multiple details.) Don't forget to check for possible NULL values!

Here, I use the subquery to calculate the TotalPay for each vendor for the year in question, and then join that back to the list of all vendors. (Threw in table aliases as well, for legibility.)

--  As a subquery
SELECT
   hd.VendorID,
  ,sum(case
         when year(hd.OrderDate) = 2011 then hd.TotalDue
         else 0
       end)                             as "TotalPay IN 2011"
  ,isnull(subQuery.TotaOrderIn2011, 0)  as "TotalOrder IN 2011"
 from PurchaseOrderHeader hd
  left join (--  Calculate volume by vendor for 2011
             select 
                hd.VendorID
               ,sum(OrderQty)  TotalOrderIn2011
              from PurchaseOrderHeader hd
               inner join PurchaseOrderDetail dt 
                on hd.PurchaseOrderID = dt.PurchaseOrderID
              where year(hd.OrderDate) = 2011
              group by
                hd.VendorID
    ) subQuery
   on subQuery.VendorId = hd.VendorId
 group by hd.VendorID
 order by hd.VendorID

Upvotes: 0

Matijs
Matijs

Reputation: 2553

The default way to avoid double counting is to use SUM(DISTINCT expr).

This does not always work well enough, as you do not want to sum distinct values, but want to sum distinct rows even when those rows share the same values.

The solution is to use a sub-query to sum the details on order number and then join the result. Then you have only one total per order id to join with the order lines:

SELECT PurchaseOrderHeader.VendorID, 
  SUM(PurchaseOrderHeader.TotalDue) AS "TotalPay IN 2011", 
  SUM(POD.Qty) AS "TotalOrder IN 2011"
FROM PurchaseOrderHeader 
LEFT JOIN (
    SELECT PurchaseOrderDetail.PurchaseOrderID, SUM(OrderQty) AS Qty
    FROM PurchaseOrderDetail 
    GROUP BY PurchaseOrderDetail.PurchaseOrderID
   ) AS POD on PurchaseOrderHeader.PurchaseOrderID = POD.PurchaseOrderID
WHERE Datename(year,PurchaseOrderHeader.OrderDate) = 2011 
GROUP BY PurchaseOrderHeader.VendorID
ORDER BY VendorID

Also I took the freedom to remove the CASE WHEN statement from the SUM() to the WHERE part of the query. In this case that should give you the same results with shorter code.

Upvotes: 0

CodeArhat
CodeArhat

Reputation: 16

select  h.VendorID,
        SUM(CASE WHEN Datename(year,h.OrderDate) = 2011 THEN h.TotalDue else 0 END) as "TotalPay IN 2011", 
        SUM(CASE WHEN Datename(year,h.OrderDate) = 2011 THEN d.OrderQty else 0 END) as "TotalOrder IN 2011"
from    PurchaseOrderHeader h
left    join (
        select  t.PurchaseOrderID,
                sum(t.OrderQty) as OrderQty
        from    PurchaseOrderDetail t
        group   by t.PurchaseOrderID
        ) d on d.PurchaseOrderID = h.PurchaseOrderID
group by h.VendorID
order by VendorID

Upvotes: 0

Sean Lange
Sean Lange

Reputation: 33581

You can just take your SUM and divide by COUNT. Something like this.

select PurchaseOrderHeader.VendorID, 
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderHeader.TotalDue else 0 END) / COUNT(*) as "TotalPay IN 2011", 
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderDetail.OrderQty else 0 END) / COUNT(*) as "TotalOrder IN 2011"
from Purchasing.PurchaseOrderHeader
left join Purchasing.PurchaseOrderDetail on PurchaseOrderHeader.PurchaseOrderID = PurchaseOrderDetail.PurchaseOrderID
group by PurchaseOrderHeader.VendorID
order by VendorID

Upvotes: 1

Related Questions