PBJ
PBJ

Reputation: 384

Optimize query with self-joins and group bys in where clause

This is my old query:

Select AccountID,AccountDesc,
    CASE 
    WHEN NOT EXISTS (select AccountID from #B where #B.AccountID = P1.AccountID) THEN cast(cast(GETDATE() as date) as varchar)     
    ELSE cast(cast(CreateDate as date) as varchar)
    END 
    FROM #A P1
    WHERE P1.EndDate = (Select max(EndDate) from #A P2 where P1.AccountID = P2.AccountID and P2.Categ1 = 'F'
    group by P2.AccountID) 
    and P1.StartDate = (Select max(StartDate) from #A P2 where P1.AccountID = P2.AccountID and P2.Categ1 = 'F'
    group by P2.AccountID) 
    and P1.CreateDate = (Select max(CreateDate ) from #A P2 where P1.AccountID = P2.AccountID and P2.Categ1 = 'F'
    group by P2.AccountID) 
    and P1.NewStartDate= (Select max(NewStartDate) from #A P2 where P1.AccountID = P2.AccountID and P2.Categ1 = 'F'
    group by P2.AccountID)

I wanted to get rid of the repetitive self-joins and group bys in the where clause, so I got rid of the "Categ1 = F" part and put it up in the work table #A's where clause instead when I am inserting into it. And, I figured the self-join was only for getting the MAX values, so since I made the work table I didn't think I would need this self-joins anymore. So, I end up with this new query:

Select AccountID,AccountDesc,
    CASE 
    WHEN NOT EXISTS (select AccountID from #B where #B.AccountID = P1.AccountID) THEN cast(cast(GETDATE() as date) as varchar)     
    ELSE cast(cast(CreateDate as date) as varchar)
    END 
    FROM #A P1
    WHERE P1.EndDate = (Select max(EndDate) from #A)
    and P1.StartDate = (Select max(StartDate) from #A) 
    and P1.CreateDate = (Select max(CreateDate) from #A) 
    and P1.NewStartDate= (Select max(NewStartDate) from #A)

But this is not returning the same number of records, I believe because I also deleted the "Group By" clauses in the where conditions?

Can someone help me make this original query faster, but still return the same results?

Upvotes: 0

Views: 44

Answers (3)

KeithL
KeithL

Reputation: 5594

you can create a subquery and join that in so the calcs on happen once.

;with cte as
(
select Account,max(EndDate) ED ,max(StartDate) SD,max(CreateDate ) CD ,max(NewStartDate) NSD 
from #a
where Categ1 = 'F'
group by Account
)
Select AccountID,AccountDesc,
    CASE 
    WHEN NOT EXISTS (select AccountID from #B where #B.AccountID = P1.AccountID) THEN cast(cast(GETDATE() as date) as varchar)     
    ELSE cast(cast(CreateDate as date) as varchar)
    END 
 FROM #A p1
     join cte on p1.Account=cte.Account
           and p1.StartDate=cte.SD
           and p1.EndDate = cte.ED
           and p1.CreateDate = cte.CD 
           and p1.NewStartDate=cte.NSD

here is another enhancement:

CASE 
    WHEN NOT EXISTS (select AccountID from #B where #B.AccountID = P1.AccountID) THEN cast(cast(GETDATE() as date) as varchar)     
    ELSE cast(cast(CreateDate as date) as varchar)

can get changed by left joining #B

;with cte as
(
select Account,max(EndDate) ED ,max(StartDate) SD,max(CreateDate ) CD ,max(NewStartDate) NSD 
from #a
where Categ1 = 'F'
group by Account
)
Select AccountID,AccountDesc,
    case when p2.Account is not null 
         then cast(CreateDate as varchar(10)) 
         else cast(cast(getdate() as date) as varchar(10)) end
 FROM #A p1
     join cte on p1.Account=cte.Account
           and p1.StartDate=cte.SD
           and p1.EndDate = cte.ED
           and p1.CreateDate = cte.CD 
           and p1.NewStartDate=cte.NSD
     left join #B p2 on p1.Account=p2.Account

Upvotes: 0

Mahesh.K
Mahesh.K

Reputation: 901

I Believe this will help you out

Select AccountID,AccountDesc,
     CASE WHEN NOT EXISTS (select AccountID from #B where #B.AccountID = P1.AccountID) THEN cast(cast(GETDATE() as date) as varchar)     
     ELSE cast(cast(CreateDate as date) as varchar) END 
  FROM #A P1
    CROSS APPLY (
                  SELECT max(EndDate) EndDate, max(StartDate) StartDate, max(CreateDate ) CreateDate ,max(NewStartDate)NewStartDate 
                  FROM #A P2
                  WHERE P1.AccountID = P2.AccountID and P2.Categ1 = 'F'
                 ) AS P3
    WHERE P1.EndDate = P3.EndDate AND P1.StartDate = P3.StartDate
    AND P1.CreateDate = P3.CreateDate AND P1.NewStartDate = p3.NewStartDate

Upvotes: 0

Thorsten Kettner
Thorsten Kettner

Reputation: 94894

To start with: All those GROUP BY AccountID clauses in your query are superfluous, as the subqueries always only select data for one AccountID anyway.

What's been tried in the WHERE clause is this:

WHERE (AccountID, EndDate, StartDate, CreateDate, NewStartDate) IN
(
  SELECT AccountID, MAX(EndDate), MAX(StartDate), MAX(CreateDate), MAX(NewStartDate)
  FROM #A
  WHERE Categ1 = 'F'
  GROUP BY AccountID
)

SQL Server, however, doesn't support this syntax. So convert this to EXISTS:

WHERE EXISTS
(
  SELECT 1
  FROM #A P2
  WHERE P2.AccountID = P1.AccountID AND P2.Categ1 = 'F'
  HAVING P1.EndDate = MAX(P2.EndDate)
     AND P1.StartDate = MAX(P2.StartDate)
     AND P1.CreateDate = MAX(P2.CreateDate)
     AND P1.NewStartDate = MAX(P2.NewStartDate) 
)

Upvotes: 1

Related Questions