Reputation: 1605

2 similar queries against a large SQL Server table performing differently

I have a huge table in my database that contains distances between cities. This enables my application to find nearby cities around the world when a starting city is selected.

It contains 4 columns:

ID, StartCityID, EndCityID, Distance

and contains about 120 million rows.

I've got indexes set up on the startcityID, endcityID, another one for both, and another one each for startcity + distance, and endcity + distance (this is my first real dealings with indexes so not 100% sure if I'm doing it correctly).

Anyway - I do the following 2 queries:

Select distinct StartCityID
From Distances where EndCityID = 23485

and

Select distinct EndCityID 
From Distances where StartCityID = 20045

They both return the same number of cityID's, but the top one takes 35 seconds to do, and the bottom one returns results immediately. When I look at the indexes, they seem to be set up to serve startCity and endCity in the same way.

Anyone know why they might be acting differently? I'm at a loss...

NB - this may offer more insight, but the one that takes 35 seconds - if I press execute again straight away with the same ID, it returns results immediately as well that time.

Unfortunately that isn't good enough for my website but it may be useful information.

Thanks

Upvotes: 2

Answers (3)

Gulli Meel

Reputation: 891

The second one is covering index and thus fast because you have index on startcity and endcity.

The index on endcity is not covering (as it doesnt have startcity) and thus either it has to join with other indexes to get the data or has to do key lookup and thus takes time.Also, it has to do hash distinct or distinct using sor whereas first one doesnt need to do that as well as data is sorted in endcity order for a given startcity.Also why use distinct will you have duplicate data for startcity and endcity.If no dup data remove distinct.

Check then plan for these first one should be index seek on endcity + distnace index and then most probably key lookup it could be clustred index scan as well based on the selectivity of the endcity.Then a hash distinct or sort distinct .

Second one should have just the index seek on index startcity + endcity.

You have mentioned that second time it returned immediately that is because data was already in cache. Thus try following

dbcc dropcleanbuffers dbcc freeproccache and then run the second query first..

CAUTION : Do not use these on PROD server and other cirtical servers.Try this on a machine where it wont impact other users.

Upvotes: 1

AnandPhadke

Reputation: 13506

Try this query (avoid DISTINCT Keyword)

Select StartCityID From Distances  group by StartCityID where EndCityID = 23485

Select EndCityID  From Distances  group by EndCityID  where StartCityID = 20045

Upvotes: 0

Germann Arlington

Reputation: 3353

All you have to do is to think about it...

Does your table have a primary key? What is it? What does it mean (to have a primary key)? What does DISTINCT keyword asks for?

Upvotes: 0

2 similar queries against a large SQL Server table performing differently

Answers (3)

Related Questions