Reputation: 17581

Optimize SQL Stored Procedure

A stored procedure that runs a SELECT on a large table is timing out. The where clause is causing the timeout, because I am selecting only cities that are in another table.

AND city IN (SELECT DISTINCT city from tOH where clientId = @clientId)
AND state IN (SELECT DISTINCT state from tOH where clientId = @clientId)

*note almost always only one state will be returned

I am trying to put the cities into a table and then use the table to populate the cities, but I am getting an error that @cities is not declared.

DECLARE @cities TABLE
(
city varchar(200)
);
INSERT INTO @cities (city) SELECT city FROM tOH WHERE clientId = @clientId GROUP BY city

Then my where clause changes to

AND city IN (SELECT city from @cities)

Can anyone figure out a good way of optimizing this stored procedure?

---------------------------- UPDATE ------------------------------------

The joins are all too slow. I think a solution with a temp table or table variable will work.

Upvotes: 0

Answers (12)

Joel Coehoorn

Reputation: 415600

Not only is it slow, it's incorrect.

Say your city is "Evansville, WI", but your tOH table only has entries for "Evansville, IN" and "Milwaukee, WI". You currently check the city and state portions separately, so your existing query will find matches for both "Evansville" and "WI". It will allow that city, even though it really shouldn't.

Do this instead:

INNER JOIN 
  ( 
    SELECT DISTINCT City AS tOHCity, State AS tOHState 
    FROM tOH 
    WHERE ClientID= @ClientID 
  ) cs ON cs.tOHCity = city AND cs.tOHState = state

Note that the subquery is based on the assumption that the DISTINCT from your original post is necessary because you could have more than one of the same city in that table per client. If that's not the case, you can just join to the tOH table directly.

Combine this with proper indexing and you should be good.

Upvotes: 8

catalpa

Reputation: 1982

I'm assuming the reason for the DISTINCT on tOH is that a city name may exist in multiple states and likewise there are multiple occurrences for a state since each state has multiple cities.

If each city and state combination is a unique occurrence it would be more appropriate and cost effective to drop the DISTINCT and do something like the following:

select mytable.* 
from mytable m
inner join tOH t on  t.clientid = @clientId 
    and t.city = m.city and t.state = m.state

Upvotes: 1

Frank V

Reputation: 25419

Change your IN filter to exists. So instead of:

AND city IN (SELECT DISTINCT city from tOH where clientId = @clientId)
AND state IN (SELECT DISTINCT state from tOH where clientId = @clientId)

Change it to something like:

AND EXISTS ( 
      SELECT 0 FROM tOH t WHERE 
      ClientID = @clientId 
      and t.City = parent.ctiy 
      and t.state = parent.state 
)

I've done performance testing and have always found that EXISTS performs faster than IN. Furthermore, since you are doing it twice with the same table, you are taking a double hit.

You should also index city and state in tOH, if possible. Indexes are a give take so make sure you understand what the implications are if you add them.

Upvotes: 0

Arvo

Reputation: 10570

You can use your original query (without joins), making next corrections:

Create index on TOH by clientID, city, state
Remove DISTINCT keyword - this allows optimizer to make efficient use of that index

Upvotes: 0

HLGEM

Reputation: 96552

In answer to your problem of why @cities as a table variable might not work, you haven't shown the rest of the sp but I'll be you somewhere are building dyunamic SQL and executing it. That woudl be out of scope for a previously existing table variable.

Upvotes: 0

C Bauer

Reputation: 5103

I think it might be worthwhile to point out the reason behind the percieved timeout. Your query selects from the original table every single record, then for each record it selects it has to subquery the DISTINCT list of cities in the same table over and over again for each record.

Upvotes: 4

AllenG

Reputation: 8190

You've probably tried this, but my first reaction would be to use populate a temp table with your cities. That may be what you're doing and I'm just not familiar with the syntax, but I've always used:

Create Table #Cities(City varchar(200))

Then you'll fill the temp table and query from it as in your example (INSERT INTO... and AND city IN (SELECT city from #Cities))

Upvotes: 1

KM.

Reputation: 103579

select ....
from ....
    inner join tOH ON ...city=tOH.city and ...state=tOH.state
where ... and tOH.clientId = @clientId

Upvotes: 0

Winston Smith

Reputation: 21884

From SQL Hacks:

When a subquery contains no aggregation functions, chances are you don't need a subquery - you need a JOIN.

So, you should convert your first subquery (AND CITY IN) to a JOIN. Unless you give us the rest of the query, we won't be able to show you exactly how, but the basis of it will be adding City as a table you are selecting from in the main query.

Upvotes: 4

Al W

Reputation: 7713

Use EXISTS instead of IN

AND EXISTS(SELECT 1 FROM tOH WHERE tOH.city=main.city AND clientId=@clientId)

you'll also want to make sure that city is indexed in both tables.

Upvotes: 1

Nathan Koop

Reputation: 25197

You may want to look into placing an index on the State column, but you should do some benchmarks on this. You'll have to weigh the benefit of the index vs the cost when inserting new rows.

You may also want to do the same to the City column.

Upvotes: 1

atfergs

Reputation: 1684

I would try a JOIN on the tOH table and filter the entire query by clientid. You could also use SELECT INTO to throw it in a temp table.

Upvotes: 4

Optimize SQL Stored Procedure

Answers (12)

Related Questions