Reputation: 115
I have a query that takes about 3 minutes to run. The Query is being used for reporting and i would like it to be more efficient. I think the bottle neck is the inner joins, I could be wrong just want to know if you guys had any solutions.
SELECT DISTINCT
SUBSTRING(T1.DateTime, 1, 4) AS Year,
SUBSTRING(T1.DateTime, 5, 2) AS Month,
SUBSTRING(T1.DateTime, 7, 2) AS Day,
T1.PipeNr, T1.SalesOrder, T1.JobNr, SIST.DefectCode AS DefectRef,
DEFCODES.DefectCode, DEFCODES.DefectDesc, SIST.ODYes, SIST.LocationWrap,
T1.OWS0601, T1.OWS0602, T1.OWS0603, T1.CrossWeld
FROM PIPEDB.dbo.SIST INNER JOIN PIPEDB.dbo.MPIPEID T1 ON PIPEDB.dbo.SIST.PipeNr = T1.PipeNr INNER JOIN
PIPEDB.dbo.DEFCODES ON PIPEDB.dbo.SIST.DefectCode = PIPEDB.dbo.DEFCODES.DefectRef
WHERE PIPEDB.dbo.SIST.DefectCode
IN (
SELECT Top (10) PIPEDB.dbo.SIST.DefectCode
FROM PIPEDB.dbo.SIST INNER JOIN PIPEDB.dbo.MPIPEID T2 ON PIPEDB.dbo.SIST.PipeNr = T2.PipeNr INNER JOIN
PIPEDB.dbo.DEFCODES ON PIPEDB.dbo.SIST.DefectCode = PIPEDB.dbo.DEFCODES.DefectRef
WHERE SUBSTRING(T2.DateTime, 1, 4) = SUBSTRING(T1.DateTime, 1, 4) AND SUBSTRING(T2.DateTime, 5, 2) = SUBSTRING(T1.DateTime, 5, 2) AND
SUBSTRING(T2.DateTime, 7, 2) = SUBSTRING(T1.DateTime, 7, 2)
GROUP BY PIPEDB.dbo.SIST.DefectCode
ORDER BY COUNT(PIPEDB.dbo.SIST.PipeNr) DESC)
AND (PIPEDB.dbo.DEFCODES.DefectDesc IN ("Cut To Remove Defect")) AND
((CASE WHEN T1.CrossWeld = 1 THEN 1 WHEN T1.CrossWeld = 0 THEN 2 END) = @Crossweld OR @Crossweld = 0)
Upvotes: 0
Views: 129
Reputation: 35557
Do you have to have DISTINCT
? I'd rather use GROUP BY
to return a unique set of records e.g
SELECT DISTINCT X
FROM Y
change to
SELECT X
FROM Y
GROUP BY X
Also I prefer using EXISTS
to using IN
in my WHERE
clauses as they are generally marginally faster.
I'd be tempted to put the following section in a CTE
and use a partition function RANK() OVER
instead of the TOP(10)
/ORDER BY
set up
IN (
SELECT Top (10) PIPEDB.dbo.SIST.DefectCode
FROM PIPEDB.dbo.SIST INNER JOIN PIPEDB.dbo.MPIPEID T2 ON PIPEDB.dbo.SIST.PipeNr = T2.PipeNr INNER JOIN
PIPEDB.dbo.DEFCODES ON PIPEDB.dbo.SIST.DefectCode = PIPEDB.dbo.DEFCODES.DefectRef
WHERE SUBSTRING(T2.DateTime, 1, 4) = SUBSTRING(T1.DateTime, 1, 4) AND SUBSTRING(T2.DateTime, 5, 2) = SUBSTRING(T1.DateTime, 5, 2) AND
SUBSTRING(T2.DateTime, 7, 2) = SUBSTRING(T1.DateTime, 7, 2)
GROUP BY PIPEDB.dbo.SIST.DefectCode
ORDER BY COUNT(PIPEDB.dbo.SIST.PipeNr) DESC)
Upvotes: 0
Reputation: 891
The issue is that you are making IN subquery as a correlated subquery which in turn is causing the performance issue.For each rows from the main clause (or outer from) the IN subquery is being executed and thus it is taking time.Each execution of Inner subquery could be expensive based on size of tables and most probably it will use a nested loop join for in subquery. Also if number of rows from outer query are quite high then the expensive IN subquery is being executed lots of time and thus degrading the performance of query.I would suggest that have a look at the logic once again and make sure that it is giving correct data.
See below.I have tried to reproduce your issue.The cost for this query is 2045 which is very huge and off course it is going to take time. It clocks around 10 million io's 176 sec of CPU and 48 seconds elapsed time.
That's not it.I expected to have just 10 productid(I am sure you alos expect to have just 10 defectcodes) but i could see lots of other productids just because of the correlated subquery.
set statistics io,time on
go
select * from Sales.SalesOrderHeader soh inner join sales.SalesOrderDetail sod
on sod.SalesOrderID=soh.SalesOrderID
where sod.ProductID in (
select top 10 ProductID
from Sales.SalesOrderHeader soh1 inner join sales.SalesOrderDetail sod
on sod.SalesOrderID=soh1.SalesOrderID
where datepart(yy,soh.OrderDate) = DATEPART(yy,soh1.OrderDate)
and datepart(mm,soh.OrderDate) = DATEPART(mm,soh1.OrderDate)
)
order by sod.ProductID
Upvotes: 1
Reputation: 2841
Follow these instructions to see it SQL Server recommends any indexes:
Place the query in a query window in Microsoft SQL Server Management Studio.
Instead of running it, press CTRL+L (or select Query > Display Estimated Execution Plan from the menus).
Then in the Execution Plan tab, you may see some text that is green and starts with "Missing Index (Impact xx.xxxx%)..." if you do, right click that text and select Missing Index Details. This will open a new window with the suggested index definition (you have to supply the index name). Example:
/*
Missing Index Details from SQLQuery3.sql - (local).master (sa (55))
The Query Processor estimates that implementing the following index could improve the query cost by 99.3783%.
*/
/*
USE [DatabaseName]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Table] ([Column1])
INCLUDE ([Column2],[Column2])
GO
*/
Replace '<Name of Missing Index, sysname,>' with a valid index name and create the index.
These steps can be repeated until no indexes are suggested.
Beyond that, using SubString in your where clause is probably a problem.
This:
WHERE SUBSTRING(T2.DateTime, 1, 4) = SUBSTRING(T1.DateTime, 1, 4) AND SUBSTRING(T2.DateTime, 5, 2) = SUBSTRING(T1.DateTime, 5, 2) AND SUBSTRING(T2.DateTime, 7, 2) = SUBSTRING(T1.DateTime, 7, 2)
Could definately be this:
WHERE SUBSTRING(T2.DateTime, 1, 8) = SUBSTRING(T1.DateTime, 1, 8)
Or even better:
WHERE LEFT(T2.DateTime, 8) = Left(T1.DateTime, 1, 8)
Or even better (can allow more index usage):
WHERE T2 LIKE Left(T1.DateTime, 1, 8) + '%'
And if these date strings are only 8 characters long, it could be this:
WHERE T2.DateTime = T1.DateTime
If the query still does not run in the desired time, it would be a good move to restructure the data such that SUBSTRING or LEFT are not required in the where clause... i.e. break out the date in it's own column.
Upvotes: 0