Reputation: 13559
Question
I recently learned about CTE's and how to use them for performance winning, however I think I am overusing them in this case. The below query is taking about 1 minute with only 1000 examinations and I am sure it can be done much faster. Please suggest optimizations or terms that can point me in the right direction?
Context
To understand the query, you need to have some context so here it is. In the database we have examinations which will be performed on an object (E.G. a pen needs to be examinated if it can write). Before an examination can start an examination must be acknowledged so there is the acknowledged date. Of course we also save when an examination is created (createdate). When an examination is ready (Ready Date) it needs to get a result. A result can be approved (isapproved) or rejected (isrejected). The meaning of this query is to create graph which shows the 5 conditions of examinations per day.
Counts needed per day:
Actual Query
DECLARE @ProjectID [bigint] = 3
DECLARE @LocationID [bigint] = 2
DECLARE @PersonID [bigint] = 1058
DECLARE @StartDate [datetime] = '2012-01-01'
DECLARE @EndDate [datetime]= '2014-12-31'
;WITH ExaminationCTE(acknowledgedate, readydate, createdate, isrejected, isapproved)
AS
(
SELECT [acknowledgedate], [readydate], createdate, [isrejected], [isapproved] FROM [dbo].[Examination]
LEFT JOIN [dbo].[Object] [o] ON [dbo].[Examination].[fk_objectid] = [o].[pk_objectid]
LEFT JOIN [dbo].[ExaminationResults] [er] ON [dbo].[Examination].[fk_examinationid] = [er].[pk_examinationid]
WHERE [fk_projectid] = @ProjectID
AND [fk_locationid] = @LocationID
AND [fk_personid] = @PersonID
),
dates(Date) AS
(
--Get all the dates between the selected range
SELECT @StartDate as Date
UNION ALL
SELECT DATEADD(d,1,[Date])
FROM dates
WHERE DATE < @EndDate
)
-- Select all the counts from the examination CTE
SELECT d.Date,
(SELECT COUNT(*) FROM [ExaminationCTE] WHERE [createdate] <= [d].[Date]) AS 'examinations_created_count',
(SELECT COUNT(*) FROM [ExaminationCTE] WHERE [acknowledgedate] <= [d].[Date]) AS 'examinations_acknowledged_count',
(SELECT COUNT(*) FROM [ExaminationCTE] WHERE [readydate] <= [d].[Date]) AS 'examinations_ready_count',
(SELECT COUNT(*) FROM [ExaminationCTE] WHERE [readydate] <= [d].[Date] AND [isapproved] = 1) AS 'examinations_ready_and_approved_count',
(SELECT COUNT(*) FROM [ExaminationCTE] WHERE [readydate] <= [d].[Date] AND [isrejected] = 1) AS 'examinations_ready_and_rejected_count'
FROM dates d
OPTION (MAXRECURSION 0)
While the query is working at the moment it isn't meeting my performance specs. I feel this can be done neater.
UPDATE: Due to the help of @GordonLinoff I'm almost there. However I need to fill the NULL rows with the last possible value.
In the image you can see the results are null the arrows, points out what the value should be, for the first set of NULL I need to get the value from the last available date before the selected start date.
Upvotes: 1
Views: 372
Reputation: 1269563
Yes, you can make this more efficient. You are doing a whole bunch of correlated non-equijoins and then counting the results.
Take a different approach. One simple approach would be to summarize ExaminationCTE
by date, to summarize the results. However, in SQL Server, the CTE isn't materialized so this might not change the performance.
Another way is to do the count using window functions. Here is an example for the first field:
from dates left outer join
(select createdate, max(seqnum) as examinations_created_count
from (select createdate, row_number() over (order by createdate) as seqnum
from ExaminationCTE
) e
group by createdate
) ecc
on dates.date = ecc.createdate
The others would be similiar. In fact, you can put all the variables in one subquery.
The logic is to add a sequential number to each row and then take the maximum number per date. This is the count.
Upvotes: 1