Nick N.
Nick N.

Reputation: 13559

SQL query taking too long, are there any optimizations possible, need to be pointed into right direction

Question

I recently learned about CTE's and how to use them for performance winning, however I think I am overusing them in this case. The below query is taking about 1 minute with only 1000 examinations and I am sure it can be done much faster. Please suggest optimizations or terms that can point me in the right direction?

Context

To understand the query, you need to have some context so here it is. In the database we have examinations which will be performed on an object (E.G. a pen needs to be examinated if it can write). Before an examination can start an examination must be acknowledged so there is the acknowledged date. Of course we also save when an examination is created (createdate). When an examination is ready (Ready Date) it needs to get a result. A result can be approved (isapproved) or rejected (isrejected). The meaning of this query is to create graph which shows the 5 conditions of examinations per day.

Counts needed per day:

  1. examinations_created_count
  2. examinations_acknowledged_count
  3. examinations_ready_count
  4. examinations_ready_and_approved_count
  5. examinations_ready_and_rejected_count

Actual Query

DECLARE @ProjectID [bigint] = 3
DECLARE @LocationID [bigint] = 2
DECLARE @PersonID [bigint] = 1058
DECLARE @StartDate [datetime] = '2012-01-01' 
DECLARE @EndDate [datetime]= '2014-12-31'

;WITH ExaminationCTE(acknowledgedate, readydate, createdate, isrejected, isapproved) 
AS
(
    SELECT [acknowledgedate], [readydate], createdate, [isrejected], [isapproved]  FROM [dbo].[Examination]
    LEFT JOIN [dbo].[Object] [o] ON [dbo].[Examination].[fk_objectid] = [o].[pk_objectid]
    LEFT JOIN [dbo].[ExaminationResults] [er]  ON [dbo].[Examination].[fk_examinationid] = [er].[pk_examinationid]
    WHERE [fk_projectid] = @ProjectID 
    AND [fk_locationid] = @LocationID 
    AND [fk_personid] = @PersonID
),
dates(Date) AS 
(
    --Get all the dates between the selected range
    SELECT @StartDate as Date
    UNION ALL
    SELECT DATEADD(d,1,[Date])
    FROM dates 
    WHERE DATE < @EndDate
)

-- Select all the counts from the examination CTE
SELECT d.Date,
       (SELECT COUNT(*) FROM [ExaminationCTE] WHERE [createdate] <= [d].[Date]) AS 'examinations_created_count',
       (SELECT COUNT(*) FROM [ExaminationCTE] WHERE [acknowledgedate] <= [d].[Date]) AS 'examinations_acknowledged_count',
       (SELECT COUNT(*) FROM [ExaminationCTE] WHERE [readydate] <= [d].[Date]) AS 'examinations_ready_count',
       (SELECT COUNT(*) FROM [ExaminationCTE] WHERE [readydate] <= [d].[Date] AND [isapproved] = 1) AS 'examinations_ready_and_approved_count',
       (SELECT COUNT(*) FROM [ExaminationCTE] WHERE [readydate] <= [d].[Date] AND [isrejected] = 1) AS 'examinations_ready_and_rejected_count'      
FROM dates d
OPTION (MAXRECURSION 0)

While the query is working at the moment it isn't meeting my performance specs. I feel this can be done neater.

UPDATE: Due to the help of @GordonLinoff I'm almost there. However I need to fill the NULL rows with the last possible value.

This image shows what the null values must be

In the image you can see the results are null the arrows, points out what the value should be, for the first set of NULL I need to get the value from the last available date before the selected start date.

Upvotes: 1

Views: 372

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

Yes, you can make this more efficient. You are doing a whole bunch of correlated non-equijoins and then counting the results.

Take a different approach. One simple approach would be to summarize ExaminationCTE by date, to summarize the results. However, in SQL Server, the CTE isn't materialized so this might not change the performance.

Another way is to do the count using window functions. Here is an example for the first field:

from dates left outer join
     (select createdate, max(seqnum) as examinations_created_count
      from (select createdate, row_number() over (order by createdate) as seqnum
            from ExaminationCTE
           ) e
      group by createdate
     ) ecc
     on dates.date = ecc.createdate 

The others would be similiar. In fact, you can put all the variables in one subquery.

The logic is to add a sequential number to each row and then take the maximum number per date. This is the count.

Upvotes: 1

Related Questions