Reputation: 81516
I'm trying to generate a sales reports which lists each product + total sales in a given month. Its a little tricky because the prices of products can change throughout the month. For example:
This setup is represented in the database as follows:
Sales table Sale_ID ProductID Sale_Date 1 1 2009-01-01 2 1 2009-01-01 3 1 2009-01-02 ... 50 1 2009-01-15 51 1 2009-01-16 52 1 2009-01-17 ... 100 1 2009-01-31 Prices table Product_ID Sale_Date Price 1 2009-01-01 10.00 1 2009-01-16 15.00
When a price is defined in the prices table, it is applied to all products sold with the given ProductID from the given SaleDate going forward.
Basically, I'm looking for a query which returns data as follows:
Desired output Sale_ID ProductID Sale_Date Price 1 1 2009-01-01 10.00 2 1 2009-01-01 10.00 3 1 2009-01-02 10.00 ... 50 1 2009-01-15 10.00 51 1 2009-01-16 15.00 52 1 2009-01-17 15.00 ... 100 1 2009-01-31 15.00
I have the following query:
SELECT
Sale_ID,
Product_ID,
Sale_Date,
(
SELECT TOP 1 Price
FROM Prices
WHERE
Prices.Product_ID = Sales.Product_ID
AND Prices.Sale_Date < Sales.Sale_Date
ORDER BY Prices.Sale_Date DESC
) as Price
FROM Sales
This works, but is there a more efficient query than a nested sub-select?
And before you point out that it would just be easier to include "price" in the Sales table, I should mention that the schema is maintained by another vendor and I'm unable to change it. And in case it matters, I'm using SQL Server 2000.
Upvotes: 3
Views: 2477
Reputation: 607
I agreee with Sean. The code you have written is very clean and understandable. If you are having performance issues, then take the extra effort to make the code faster. Otherwise, you are making the code more complex for no reason. Nested sub-selects are extremely useful when used judiciously.
Upvotes: 0
Reputation: 131112
If you start storing start and end dates, or create a view that includes the start and end dates (you can even create an indexed view) then you can heavily simplify your query. (provided you are certain there are no range overlaps)
SELECT
Sale_ID,
Product_ID,
Sale_Date,
Price
FROM Sales
JOIN Prices on Sale_date > StartDate and Sale_Date <= EndDate
-- careful not to use between it includes both ends
Note:
A technique along these lines will allow you to do this with a view. Note, if you need to index the view, it will have to be juggled around quite a bit ..
create table t (d datetime)
insert t values(getdate())
insert t values(getdate()+1)
insert t values(getdate()+2)
go
create view myview
as
select start = isnull(max(t2.d), '1975-1-1'), finish = t1.d from t t1
left join t t2 on t1.d > t2.d
group by t1.d
select * from myview
start finish
----------------------- -----------------------
1975-01-01 00:00:00.000 2009-01-27 11:12:57.383
2009-01-27 11:12:57.383 2009-01-28 11:12:57.383
2009-01-28 11:12:57.383 2009-01-29 11:12:57.383
Upvotes: 2
Reputation: 37655
It's well to avoid these types of correlated subqueries. Here's a classic technique for such cases.
SELECT Sale_ID, Product_ID, Sale_Date, p1.Price FROM Sales AS s LEFT JOIN Prices AS p1 ON s.ProductID = p1.ProductID AND s.Sale_Date >= p1.Sale_Date LEFT JOIN Prices AS p2 ON s.ProductID = p2.ProductID AND s.Sale_Date >= p2.Sale_Date AND p2.Sale_Date > p1.Sale_Date WHERE p2.Price IS NULL -- want this one not to be found
Use a left outer join on the pricing table as p2, and look for a NULL record demonstrating that the matched product-price record found in p1 is the most recent on or before the sales date.
(I would have inner-joined the first price match, but if there is none, it's nice to have the product show up anyway so you know there's a problem.)
Upvotes: 2
Reputation: 120644
Are you actually running into performance problems or are you just anticipating them? I would implement this exactly as you have, were my hands tied from a schema-modification standpoint as yours are.
Upvotes: 0
Reputation: 9922
The combination of Product_ID and Sale_Date is your foreign key. Try a select-join on Product_ID, Sale_Date.
Upvotes: -1