Reputation: 13145
My database is stored in a sql server 2005 db.
This query takes less than one second to execute:
SELECT * FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
SUM(tblProducts.Price) as price
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid
) as x
where RowNum >= 21001 and RowNum < 21011
while this queries takes 10 seconds to execute:
SELECT * FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
SUM(tblProducts.Price) as price,
OrderDate
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, tblOrders.OrderDate
) as x
where RowNum >= 21001 and RowNum < 21011
Why might there be such a difference?
All tables have a column called id which holds the primary key. Not sure why orderid and ProductId exist also since I didn't design the database.
/Barry
UPDATE
OrderDate is a datetime
SECOND UPDATE
Remember, the three tables each have an id column which acts as the primary key. However, orderid, productid, etc are used when referencing between tables. I'm not sure why it was implemented this way, but I'm guessing its very much wrong.
tblOrders:
Id; int; no null; PK
OrderId; int; allow null
OrderDate; datetime; allow null
tblOrderDetails:
Id; int; no null; PK
OrderId; int; allow null
ProductId; int; allow null
tblProducts:
Id; int; PK; no null
ProductId; allow null
Price; money; allow null
Is this adequate re a query execution plan?-
THIRD UPDATE
This only takes one second to execute -
SELECT ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
SUM(tblProducts.Price) as price,
OrderDate
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, OrderDate
and this only 2 seconds -
SELECT * FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
SUM(tblProducts.Price) as price,
MAX(tblOrders.OrderDate) as OrderDate -- do this instead of grouping
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid ) as x
But this takes 10 seconds --
SELECT * FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
SUM(tblProducts.Price) as price,
MAX(tblOrders.OrderDate) as OrderDate -- do this instead of grouping
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid ) as x
where RowNum >= 21001 and RowNum < 21011
The where clause is adding 8 seconds. Why?
Upvotes: 0
Views: 939
Reputation: 171206
This cannot be answered without executions plans, but I can guess:
Update: The execution plan that you posts is horrific indeed.
Create indexes:
create unique nonclustered index x0 on tblOrder(orderid) include (OrderDate)
create unique nonclustered index x1 on tblProduct (productid) include (Price)
create nonclustered index x2 on tblOrderDetails(orderid, ProductId)
Upvotes: 0
Reputation: 280431
What is OrderDate? datetime? While those queries look very similar, I suspect OrderDate includes time information, so the sorting and grouping is much more expensive (and lead to many more rows in the subquery for the second query).
Consider the following change:
SELECT RowNum, price, DD = DATEADD(DAY, DD, '19000101') FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
SUM(tblProducts.Price) as price,
DATEDIFF(DAY, '19000101', tblOrders.OrderDate) as DD
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, DATEDIFF(DAY, '19000101', tblOrders.OrderDate)
) as x
where RowNum >= 21001 and RowNum < 21011
ORDER BY RowNum;
In SQL Server 2008 or better you could simplify that to CONVERT(DATE, OrderDate)
...
Upvotes: 0
Reputation: 5248
Without actual table structure and execution plans I can't answer exactly but if orderid is unique in tblOrders than it will be better to remove OrderDate from group by statement and in select list add it as min(tblOrders.OrderDate) as OrderDate
. It should give same result (if tblOrders.orderid is unique key) but work much better.
Upvotes: 0
Reputation: 3636
I'd bet you dollars to doughnuts that including "tblOrders.OrderDate" in both the output list and the grouping clause is causing your slow-down. I suggest you SET STATISTICS IO ON
and run the two queries, and see how you get different scans & seeks on each table.
Very likely the SQL engine has a dramatically different plan for the 2nd query that takes into account the OrderDate column, resulting in either more CPU processing or (more likely) lots more disk IO.
Upvotes: 2