Baz
Baz

Reputation: 13145

Paging and grouping

My database is stored in a sql server 2005 db.

This query takes less than one second to execute:

SELECT * FROM ( 

SELECT  ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,   
SUM(tblProducts.Price) as price
FROM tblOrders 
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId 
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid

) as x
where RowNum >=  21001 and RowNum <  21011

while this queries takes 10 seconds to execute:

SELECT * FROM ( 

SELECT  ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,   
SUM(tblProducts.Price) as price, 
OrderDate
FROM tblOrders 
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId 
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, tblOrders.OrderDate

) as x
where RowNum >=  21001 and RowNum <  21011

Why might there be such a difference?

All tables have a column called id which holds the primary key. Not sure why orderid and ProductId exist also since I didn't design the database.

/Barry

UPDATE

OrderDate is a datetime

SECOND UPDATE

Remember, the three tables each have an id column which acts as the primary key. However, orderid, productid, etc are used when referencing between tables. I'm not sure why it was implemented this way, but I'm guessing its very much wrong.

tblOrders:
Id; int; no null; PK
OrderId; int; allow null
OrderDate; datetime; allow null

tblOrderDetails:
Id; int; no null; PK
OrderId; int; allow null
ProductId; int; allow null

tblProducts:
Id; int; PK; no null
ProductId; allow null
Price; money; allow null

Is this adequate re a query execution plan?-

enter image description here

THIRD UPDATE

This only takes one second to execute -

SELECT  ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,   
SUM(tblProducts.Price) as price, 
OrderDate
FROM tblOrders 
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId 
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, OrderDate

and this only 2 seconds -

SELECT * FROM (
  SELECT  ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
  SUM(tblProducts.Price) as price,
  MAX(tblOrders.OrderDate) as OrderDate  -- do this instead of grouping
FROM tblOrders
  LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
  LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid  ) as x

But this takes 10 seconds --

SELECT * FROM (
  SELECT  ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,
  SUM(tblProducts.Price) as price,
  MAX(tblOrders.OrderDate) as OrderDate  -- do this instead of grouping
FROM tblOrders
  LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
  LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid  ) as x
where RowNum >=  21001 and RowNum <  21011

The where clause is adding 8 seconds. Why?

Upvotes: 0

Views: 939

Answers (4)

usr
usr

Reputation: 171206

This cannot be answered without executions plans, but I can guess:

  • The additional column might prevent use of an index
  • The cardinality of the slow query is very high
  • The statistics for OrderDate are somehow out of date (exec sp_updatestats)

Update: The execution plan that you posts is horrific indeed.

Create indexes:

create unique nonclustered index x0 on tblOrder(orderid) include (OrderDate)
create unique nonclustered index x1 on tblProduct (productid) include (Price)
create nonclustered index x2 on tblOrderDetails(orderid, ProductId)

Upvotes: 0

Aaron Bertrand
Aaron Bertrand

Reputation: 280431

What is OrderDate? datetime? While those queries look very similar, I suspect OrderDate includes time information, so the sorting and grouping is much more expensive (and lead to many more rows in the subquery for the second query).

Consider the following change:

SELECT RowNum, price, DD = DATEADD(DAY, DD, '19000101') FROM (     
SELECT  ROW_NUMBER() OVER ( ORDER BY tblOrders.orderid ) AS RowNum,   
SUM(tblProducts.Price) as price, 
DATEDIFF(DAY, '19000101', tblOrders.OrderDate) as DD
FROM tblOrders 
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId 
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, DATEDIFF(DAY, '19000101', tblOrders.OrderDate)

) as x
where RowNum >=  21001 and RowNum <  21011
ORDER BY RowNum;

In SQL Server 2008 or better you could simplify that to CONVERT(DATE, OrderDate)...

Upvotes: 0

oryol
oryol

Reputation: 5248

Without actual table structure and execution plans I can't answer exactly but if orderid is unique in tblOrders than it will be better to remove OrderDate from group by statement and in select list add it as min(tblOrders.OrderDate) as OrderDate. It should give same result (if tblOrders.orderid is unique key) but work much better.

Upvotes: 0

jklemmack
jklemmack

Reputation: 3636

I'd bet you dollars to doughnuts that including "tblOrders.OrderDate" in both the output list and the grouping clause is causing your slow-down. I suggest you SET STATISTICS IO ON and run the two queries, and see how you get different scans & seeks on each table.

Very likely the SQL engine has a dramatically different plan for the 2nd query that takes into account the OrderDate column, resulting in either more CPU processing or (more likely) lots more disk IO.

Upvotes: 2

Related Questions