Peter
Peter

Reputation: 11

SQL Server T-SQL Query Optimization

I am trying to optimize the following T-SQL query:

SELECT Person.*
FROM Person
WHERE ZipCode LIKE '123%'
AND City = 'Washington'
AND NumberOfHomes in (1, 2, 3)
AND
(
    EXISTS
    (
        SELECT * FROM House
        WHERE Person.ID = House.PersonID
        AND House.Type = 'TOWNHOUSE'
        AND House.Size = 'Medium'
    )
    OR
    EXISTS
    (
        SELECT * FROM Color
        WHERE Person.ID = Color.PersonID
        AND Color.Foreground IN ('Green', 'Blue', 'Purple')
    )
)

I'd greatly appreciate any response in optimizing the query.

In particular, is there a way to convert the query into a more efficient query using only a single SELECT statement without any of the inner SELECT statements?

Thanks!

Upvotes: 1

Views: 355

Answers (4)

sam yi
sam yi

Reputation: 4934

Left join and checking for null will be quicker than doing existence checks. Also, if NumberofHomes is an integer, doing BETWEEN will be the same as IN.

SELECT p.*
FROM Person p
LEFT JOIN House h
    ON p.ID = h.PersonID
    AND h.Type = 'TOWNHOUSE'
    AND h.Size = 'Medium'
LEFT JOIN Color c
    ON p.ID = c.PersonID
    AND c.Foreground IN ('Green', 'Blue', 'Purple')
WHERE p.ZipCode LIKE '123%'
  AND p.City = 'Washington'
  AND p.NumberOfHomes BETWEEN 1 AND 3
  AND (h.PersonID is not null or c.PersonID is not null)

OR you can try something like this...

select t.* 
from (
    select personid from house
    where type = 'townhouse' and size = 'medium'
    union
    select personid from color
    where foreground in ('green','blue','purple')
) pid
cross apply (
    select *
    from person p
    where p.id = pid.personid
      and p.zipcode like '123%'
      and p.city = 'washington'
      and p.numberofhomes between 1 and 3
    ) t
where t.id is not null

It's really difficult to optimize these blind. Depending on the distribution of your data, the above query may give you better results.

Upvotes: 0

Mark Anderson
Mark Anderson

Reputation: 11

Often optimizing and having several different select statements are different topics as the query optimizer (SQL Server) often will take your sql statement and run it the way it sees to be the most efficient way it sees fit.

Saying that yes are several different ways you can take your statements and combine them into one sql statement here is an example. This will preserve your person table and get matches from House OR Color tables that match your criteria.

<!-- language:SQL-->
SELECT *
FROM Person Left Outer Join House ON Person.ID = House.PersonID Left Outer Join Color ON
Person.ID= Color.PersonID
WHERE (ZipCode LIKE '123%'
    AND City = 'Washington'
    AND Person.NumberofHomes in (1, 2, 3) )
    AND (
        House.Type = 'TOWNHOUSE'
        AND House.Size = 'Medium'
    )
    OR(
         Color.Foreground IN ('Green', 'Blue', 'Purple')
    )

I would recommend that you reconsider your model. For example, having PersonID in color is very suspect as is having numberofhomes (that could be possibly calculated for example, from a count on the House table that has the person's id). There are some other questionable normalization attributes as well. Not part of your question but I thought you might want to consider it.

Upvotes: -1

MelgoV
MelgoV

Reputation: 656

Please try this:

SELECT p.*
FROM Person p
WHERE Substring(Ltrim(Rtrim(p.ZipCode)),1,3) = '123' AND p.City = 'Washington'AND 
(p.NumberOfHomes=1 or  p.NumberOfHomes=2 or p.NumberOfHomes=3))
AND
(
EXISTS
(
    SELECT 1 FROM House h
    WHERE p.ID = h.PersonID
    AND h.Type = 'TOWNHOUSE'
    AND h.Size = 'Medium'
)
OR
EXISTS
(
    SELECT 1 FROM Color c
    WHERE p.ID = c.PersonID
    AND (c.Foreground ='Green' or c.Foreground='Blue' or  c.Foreground='Purple')
)
);

Also this will work better:

SELECT 
    p.*
FROM Person p
Left join House h
    On (p.Id=h.PersonID)
Left join Color c
    On (p.id=c.PersonID)
WHERE Substring(Ltrim(Rtrim(p.ZipCode)),1,3) = '123' AND p.City = 'Washington'AND 
(p.NumberOfHomes=1 or  p.NumberOfHomes=2 or p.NumberOfHomes=3)) and Isnull(h.Type,'') =   'TOWNHOUSE' AND Isnull(h.Size,'') = 'Medium' AND 
(Isnull(c.Foreground,'') ='Green' or Isnull(c.Foreground,'')='Blue' or Isnull(c.Foreground,'')='Purple') and 
(h.PersonID is not null or  c.PersonID is not null);

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269633

This is the query:

SELECT p.* 
FROM Person p
WHERE p.ZipCode LIKE '123%'  AND p.City = 'Washington' AND p.NumberOfHomes in (1, 2, 3) AND
      (EXISTS (SELECT *
               FROM House h
               WHERE p.ID = h.PersonID AND h.Type = 'TOWNHOUSE' AND h.Size = 'Medium'
             ) OR 
       EXISTS (SELECT *
               FROM Color c
               WHERE p.ID = c.PersonID AND c.Foreground IN ('Green', 'Blue', 'Purple')
              )
      );

Without rewriting the query, you can optimize this with indexes. I would recommend:

Person(City, ZipCode, NumberOfHomes, Id);
House(PersonId, Type, Size);
Color(PersonID, Foreground)

Question, though. Are you sure that the ids in theHouseandColortables really match back toPerson.Id? Normally, they would have a column called something likePersonId`.

Upvotes: 4

Related Questions