Nate
Nate

Reputation: 141

Distance Calculation Efficiency MSSQL Geography

I have a SQL query that is running slow and I've determined it is related to distance calculations. I use these calculations to sort so that users looking for items can be presented with results closest to them geographically.

I use the geography function STDistance to calculate the distance from a pre-calculated sql geography data type location.

Location1.STDistance(Location2)

Location1 is based on the user's location, location2 is the location of the item.

Once we get into thousands of rows, this is not performing well, but I don't know of any clever ways to do this. I don't think it would be feasible to store all the possible distance calculations for lookup vs calculating at query time. (That would mean storing the number of unique user locations X unique item locations.)

Locations are determined by zip code. Geographically the scope is limited to the United States.

Any other thoughts?

Upvotes: 1

Views: 543

Answers (1)

tgolisch
tgolisch

Reputation: 6734

The way that people usually deal with slow geo queries is to reduce the set to something small enough that it can be performed in a reasonable amount of time. In your case, people usually employ a technique known as "geo-boxing".

The concept is to find nearby points that fall within a specific lat/lon boundary. For instance. If I want to find all of the people near 42.45678, -22.6543, I would start by determining what is a typical minimum distance. Lets assume it was 25 miles or +- 0.15 degrees lat and lon. I would query for all of those. (Lat between 41.95 and 42.95, Lon between -22.15 and -23.15). Then I would apply the distance function to find the nearest person within my reduced set. The distance calculation is much quicker after I've eliminated points that obviously are not very close.

If my reduced-set seems to be too large, then I can use a smaller box. If my result set returns no rows or too few rows, then I can resort use a recursive algorthm that picks an increasingly larger box until I find a large-enough result set.

The only down-side to this approach is that it has the possibilty of omitting the closest point. Think about a circle that touches the edges of a box vs a box inside of a circle. Points in the corner of the box might get included, but closer points outside the box (on the x axis or y axis) might get excluded. Also Lat/Lon boxes are actually more like trapezoids than squares as you get further away from the equator.

Anyway, if speed is more important than perfect accuracy. Geo-boxing is one approach to consider.

Upvotes: 2

Related Questions