What technique should I use for Optimizing the SQL Query

Question

Hi I have a stored procedure that is used to fetch records while searching. This procedure returns millions of records. However there was a bug found inside the search procedure which also return duplicate records in some scenario when certain condition are met. I have found the error why it was returning duplicate records: Below is the query that is in question:

With cteAutoApprove (AcctID, AutoApproved,DecisionDate)                
AS (
select 
    A.AcctID,
    CAST(autoEnter AS SMALLINT) AS AutoApproved, 
    DecisionDate 
from 
(
    SELECT 
        awt.AcctID, 
        MIN(awt.dtEnter) AS DecisionDate
    FROM
        dbo.AccountWorkflowTask awt 
        JOIN dbo.WorkflowTask wt ON awt.WorkflowTaskID = wt.WorkflowTaskID
        Join Task T on T.TaskID = wt.TaskID
    WHERE
        (
            (T.TaskStageID = 3 and awt.ReasonIDExit is NULL) 
            OR (wt.TaskID IN (9,15,201,208,220,308,319,320,408,420,508,608,620,1470,1608,1620))
        )
    GROUP BY 
        awt.AcctID
) A 
Join AccountWorkflowTask awt1 
    on awt1.dtEnter=A.DecisionDate and awt1.AcctID=a.AcctID
),

This CTE was returning duplicate record because of the condition on awt1.dtEnter=A.DecisionDate the dtEnter for some account was exactly same. This is the reason it returned duplicate record.

My question is what should I use to prevent this. I cannot use Distinct here as it will definitely slow down the search procedure. Shall I use Rank or Dense Rank so that it is optimized and the query takes less time to execute the result? Or some other technique? Please help as I am actually stuck here

Me.Name · Accepted Answer

It does seem like a good candidate for row_number (not rank, with the same dates on the same acctid, you'd still have multiple records) Obviously I can't test the query here, but winging it:

select 
    A.AcctID,
    CAST(autoEnter AS SMALLINT) AS AutoApproved, 
    DecisionDate 
from 
(
    SELECT 
        awt.AcctID, 
        awt.dtEnter AS DecisionDate,
        autoEnter,
        row_number() over (partition by awt.acctid order by awt.dtEnter) rnr
    FROM
        dbo.AccountWorkflowTask awt 
        JOIN dbo.WorkflowTask wt ON awt.WorkflowTaskID = wt.WorkflowTaskID
        Join Task T on T.TaskID = wt.TaskID
    WHERE
        (
            (T.TaskStageID = 3 and awt.ReasonIDExit is NULL) 
            OR (wt.TaskID IN (9,15,201,208,220,308,319,320,408,420,508,608,620,1470,1608,1620))
        )
) A 
where rnr = 1

This way, the group by is no longer necessary: getting the first date is done by row_number. Neither is the second join, the subquery already contains all the data (and the optimizer is smart enough not to do anything with the rows it doesn't need)

PS. because sql server window functions work incredibly efficient, using row_number instead of the min() - join construction, will most likely gain a performance boost, even if there were no double rows.

What technique should I use for Optimizing the SQL Query

Answers (1)

Related Questions