Reputation: 5
I am now using the mssql with its sample database "adventureworks 2014", here I faced some problems with join and sum, here is the two table I used:
PurchaseOrderHeader:
PurchaseOrderID VendorID OrderDate TotalDue
1 1580 2011-04-16 00:00:00.000 222.1492
2 1496 2011-04-16 00:00:00.000 300.6721
3 1494 2011-04-16 00:00:00.000 9776.2665
4 1650 2011-04-16 00:00:00.000 189.0395
5 1654 2011-04-30 00:00:00.000 22539.0165
6 1664 2011-04-30 00:00:00.000 16164.0229
7 1678 2011-04-30 00:00:00.000 64847.5328
PurchaseOrderDetail:
PurchaseOrderID PurchaseOrderDetailID OrderQty ProductID
1 1 4 1
2 2 3 359
2 3 3 360
3 4 550 530
4 5 3 4
5 6 550 512
6 7 550 513
7 8 550 317
7 9 550 318
7 10 550 319
Here is the sql script:
CREATE TABLE PurchaseOrderHeader(
PurchaseOrderID INTEGER NOT NULL PRIMARY KEY
,VendorID INTEGER NOT NULL
,OrderDate VARCHAR(23) NOT NULL
,TotalDue NUMERIC(10,4) NOT NULL
);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (1,1580,'2011-04-16 00:00:00.000',222.1492);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (2,1496,'2011-04-16 00:00:00.000',300.6721);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (3,1494,'2011-04-16 00:00:00.000',9776.2665);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (4,1650,'2011-04-16 00:00:00.000',189.0395);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (5,1654,'2011-04-30 00:00:00.000',22539.0165);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (6,1664,'2011-04-30 00:00:00.000',16164.0229);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (7,1678,'2011-04-30 00:00:00.000',64847.5328);
CREATE TABLE PurchaseOrderDetail(
PurchaseOrderID INTEGER NOT NULL
,PurchaseOrderDetailID INTEGER NOT NULL PRIMARY KEY
,OrderQty INTEGER NOT NULL
,ProductID INTEGER NOT NULL
);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (1,1,4,1);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (2,2,3,359);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (2,3,3,360);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (3,4,550,530);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (4,5,3,4);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (5,6,550,512);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (6,7,550,513);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,8,550,317);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,9,550,318);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,10,550,319);
and here is my code:
select PurchaseOrderHeader.VendorID,
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderHeader.TotalDue else 0 END) as "TotalPay IN 2011",
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderDetail.OrderQty else 0 END) as "TotalOrder IN 2011"
from PurchaseOrderHeader
left join PurchaseOrderDetail on PurchaseOrderHeader.PurchaseOrderID = PurchaseOrderDetail.PurchaseOrderID
group by PurchaseOrderHeader.VendorID
order by VendorID
Here is what I got:
VendorID TotalPay IN 2011 TotalOrder IN 2011
1494 9776.2665 550
1496 601.3442 6
1580 222.1492 4
1650 189.0395 3
1654 22539.0165 550
1664 16164.0229 550
1678 194542.5984 1650
while I should expect:
VendorID TotalPay IN 2011 TotalOrder IN 2011
1494 9776.2665 550
1496 300.6721 6
1580 222.1492 4
1650 189.0395 3
1654 22539.0165 550
1664 16164.0229 550
1678 64847.5328 1650
This code will join two tables on PurchaseOrderID, and calculate the TotalDue grouped by vendorID. The problem is when I use join, where will be multiple rows from table PurchaseOrderDetail refered to one row in table PurchaseOrderHeader. In this example for vendor 1496 and 1678 there are two or three rows refer to one row in PurchaseDetailHeader. So it will be added two or three times. How should I avoid adding multiple times, thanks!
Upvotes: 0
Views: 1262
Reputation: 40359
Lots of good answers, but I think they miss the bit where a vendor could have multiple purchase orders, and that throws off how the TotalOrder gets calculated. (Try a sample with multiple vendors with multiple orders with each order having multiple details.) Don't forget to check for possible NULL values!
Here, I use the subquery to calculate the TotalPay for each vendor for the year in question, and then join that back to the list of all vendors. (Threw in table aliases as well, for legibility.)
-- As a subquery
SELECT
hd.VendorID,
,sum(case
when year(hd.OrderDate) = 2011 then hd.TotalDue
else 0
end) as "TotalPay IN 2011"
,isnull(subQuery.TotaOrderIn2011, 0) as "TotalOrder IN 2011"
from PurchaseOrderHeader hd
left join (-- Calculate volume by vendor for 2011
select
hd.VendorID
,sum(OrderQty) TotalOrderIn2011
from PurchaseOrderHeader hd
inner join PurchaseOrderDetail dt
on hd.PurchaseOrderID = dt.PurchaseOrderID
where year(hd.OrderDate) = 2011
group by
hd.VendorID
) subQuery
on subQuery.VendorId = hd.VendorId
group by hd.VendorID
order by hd.VendorID
Upvotes: 0
Reputation: 2553
The default way to avoid double counting is to use SUM(DISTINCT expr)
.
This does not always work well enough, as you do not want to sum distinct values, but want to sum distinct rows even when those rows share the same values.
The solution is to use a sub-query to sum the details on order number and then join the result. Then you have only one total per order id to join with the order lines:
SELECT PurchaseOrderHeader.VendorID,
SUM(PurchaseOrderHeader.TotalDue) AS "TotalPay IN 2011",
SUM(POD.Qty) AS "TotalOrder IN 2011"
FROM PurchaseOrderHeader
LEFT JOIN (
SELECT PurchaseOrderDetail.PurchaseOrderID, SUM(OrderQty) AS Qty
FROM PurchaseOrderDetail
GROUP BY PurchaseOrderDetail.PurchaseOrderID
) AS POD on PurchaseOrderHeader.PurchaseOrderID = POD.PurchaseOrderID
WHERE Datename(year,PurchaseOrderHeader.OrderDate) = 2011
GROUP BY PurchaseOrderHeader.VendorID
ORDER BY VendorID
Also I took the freedom to remove the CASE WHEN
statement from the SUM()
to the WHERE
part of the query. In this case that should give you the same results with shorter code.
Upvotes: 0
Reputation: 16
select h.VendorID,
SUM(CASE WHEN Datename(year,h.OrderDate) = 2011 THEN h.TotalDue else 0 END) as "TotalPay IN 2011",
SUM(CASE WHEN Datename(year,h.OrderDate) = 2011 THEN d.OrderQty else 0 END) as "TotalOrder IN 2011"
from PurchaseOrderHeader h
left join (
select t.PurchaseOrderID,
sum(t.OrderQty) as OrderQty
from PurchaseOrderDetail t
group by t.PurchaseOrderID
) d on d.PurchaseOrderID = h.PurchaseOrderID
group by h.VendorID
order by VendorID
Upvotes: 0
Reputation: 33581
You can just take your SUM and divide by COUNT. Something like this.
select PurchaseOrderHeader.VendorID,
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderHeader.TotalDue else 0 END) / COUNT(*) as "TotalPay IN 2011",
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderDetail.OrderQty else 0 END) / COUNT(*) as "TotalOrder IN 2011"
from Purchasing.PurchaseOrderHeader
left join Purchasing.PurchaseOrderDetail on PurchaseOrderHeader.PurchaseOrderID = PurchaseOrderDetail.PurchaseOrderID
group by PurchaseOrderHeader.VendorID
order by VendorID
Upvotes: 1