Reputation: 81516

Optimizing Query With Subselect

I'm trying to generate a sales reports which lists each product + total sales in a given month. Its a little tricky because the prices of products can change throughout the month. For example:

Between Jan-01 and Jan-15, my company sells 50 Widgets at a cost of $10 each
Between Jan-15 and Jan-31, my company sells 50 more Widgets at a cost of $15 each
The total sales of Widgets for January = (50 * 10) + (50 * 15) = $1250

This setup is represented in the database as follows:

Sales table
  Sale_ID    ProductID    Sale_Date
  1          1            2009-01-01
  2          1            2009-01-01
  3          1            2009-01-02
             ...
  50         1            2009-01-15
  51         1            2009-01-16
  52         1            2009-01-17
             ...
  100        1            2009-01-31

Prices table
  Product_ID    Sale_Date    Price
  1             2009-01-01   10.00
  1             2009-01-16   15.00

When a price is defined in the prices table, it is applied to all products sold with the given ProductID from the given SaleDate going forward.

Basically, I'm looking for a query which returns data as follows:

Desired output
  Sale_ID    ProductID    Sale_Date     Price
  1          1            2009-01-01    10.00
  2          1            2009-01-01    10.00
  3          1            2009-01-02    10.00
             ...
  50         1            2009-01-15    10.00
  51         1            2009-01-16    15.00
  52         1            2009-01-17    15.00
             ...
  100        1            2009-01-31    15.00

I have the following query:

SELECT
    Sale_ID,
    Product_ID,
    Sale_Date,
    (
        SELECT TOP 1 Price
        FROM Prices
        WHERE
            Prices.Product_ID = Sales.Product_ID
            AND Prices.Sale_Date < Sales.Sale_Date 
        ORDER BY Prices.Sale_Date DESC
    ) as Price
FROM Sales

This works, but is there a more efficient query than a nested sub-select?

And before you point out that it would just be easier to include "price" in the Sales table, I should mention that the schema is maintained by another vendor and I'm unable to change it. And in case it matters, I'm using SQL Server 2000.

Upvotes: 3

Answers (5)

Jersey Dude

Reputation: 607

I agreee with Sean. The code you have written is very clean and understandable. If you are having performance issues, then take the extra effort to make the code faster. Otherwise, you are making the code more complex for no reason. Nested sub-selects are extremely useful when used judiciously.

Upvotes: 0

Sam Saffron

Reputation: 131112

If you start storing start and end dates, or create a view that includes the start and end dates (you can even create an indexed view) then you can heavily simplify your query. (provided you are certain there are no range overlaps)

SELECT
    Sale_ID,
    Product_ID,
    Sale_Date,
    Price
FROM Sales
JOIN Prices on Sale_date > StartDate and Sale_Date <= EndDate  
-- careful not to use between it includes both ends

Note:

A technique along these lines will allow you to do this with a view. Note, if you need to index the view, it will have to be juggled around quite a bit ..

create table t (d datetime)

insert t values(getdate())
insert t values(getdate()+1)
insert t values(getdate()+2)

go
create view myview 
as
select start = isnull(max(t2.d), '1975-1-1'), finish = t1.d  from t t1
left join t t2 on t1.d > t2.d
group by t1.d

select * from myview 

start                   finish
----------------------- -----------------------
1975-01-01 00:00:00.000 2009-01-27 11:12:57.383
2009-01-27 11:12:57.383 2009-01-28 11:12:57.383
2009-01-28 11:12:57.383 2009-01-29 11:12:57.383

Upvotes: 2

dkretz

Reputation: 37655

It's well to avoid these types of correlated subqueries. Here's a classic technique for such cases.

SELECT  
    Sale_ID,  
    Product_ID,  
    Sale_Date,  
    p1.Price  
FROM Sales AS s 
LEFT JOIN Prices AS p1 ON s.ProductID = p1.ProductID  
    AND s.Sale_Date >= p1.Sale_Date  
LEFT JOIN Prices AS p2 ON s.ProductID = p2.ProductID  
    AND s.Sale_Date >= p2.Sale_Date  
    AND p2.Sale_Date > p1.Sale_Date  
WHERE p2.Price IS NULL  -- want this one not to be found

Use a left outer join on the pricing table as p2, and look for a NULL record demonstrating that the matched product-price record found in p1 is the most recent on or before the sales date.

(I would have inner-joined the first price match, but if there is none, it's nice to have the product show up anyway so you know there's a problem.)

Upvotes: 2

Sean Bright

Reputation: 120644

Are you actually running into performance problems or are you just anticipating them? I would implement this exactly as you have, were my hands tied from a schema-modification standpoint as yours are.

Upvotes: 0

Eduard Wirch

Reputation: 9922

The combination of Product_ID and Sale_Date is your foreign key. Try a select-join on Product_ID, Sale_Date.

Upvotes: -1

Optimizing Query With Subselect

Answers (5)

Related Questions