Martin Sansone - MiOEE
Martin Sansone - MiOEE

Reputation: 4399

Scalability Location Distance Search Over 100,000 LatLng Positions Across USA

Scenario =

1) Delivery Offices spread out across the USA each specifying their own maximum delivery radius limit in miles.

2) A target address geo converted to LatLng is the delivery destination.

Goal = To return a dataset of Delivery Offices (1) who's delivery radius limit falls within this distance to the target address(2)

Attempt =

As a starting point to my problem I am using the excellent work example by the Storm consultancy in identifying the closest office to a customer: Haversine distance between two points

My 'Offices' table stores the office address along with their Lat and Lng values AND also their maximum distance 'deliveryLimit'.

The SQL to calculate the Haversine blows my mind and is currently beyond my understanding!
The Storm SQL is below, but instead of selecting only one row from straight distance calculations, I need to select all offices rows who's max distance delivery limit is less than the distance between the office and the customer.

Question1 = How do I add the max distance limit filter into the SQL query so that it returns the offices with a delivery zone which includes the target location?

Question2 = How can I limit the number of offices queried to those that realistically could be in the target area of the USA? For instance, if the target location is Boise, Idaho and offices in LA, California have a delivery distance limit of 300 miles. There is no point in even querying such offices. However, offices in Washington; Oregon and Northern Nevada states who border Idaho should be included in the search query as some could possibly have maximum distance values which reach this example of Boise, ID.

The SQL for Haversine as used by Storm:

SELECT TOP 1 *, ( 3960 * acos( cos( radians( @custLat ) ) *
cos( radians( Lat ) ) * cos( radians(  Lng  ) - radians( @custLng ) ) +
sin( radians( @custLat ) ) * sin( radians(  Lat  ) ) ) ) AS Distance
FROM Offices
ORDER BY Distance ASC

The SQL example above selects only the closest office to the target lat/long (@custLng)

Storm approached the distance calculation from two different directions. The SQL above was the first. The second was by keeping the office coordinates in an in-memory List and creating a method with function to loop over the List calculating the distances and finally selecting the closest like so:

/// <summary>
/// Returns the distance in miles or kilometers of any two
/// latitude / longitude points.
/// </summary>
/// <param name="pos1">Location 1</param>
/// <param name="pos2">Location 2</param>
/// <param name="unit">Miles or Kilometers</param>
/// <returns>Distance in the requested unit</returns>
public double HaversineDistance(LatLng pos1, LatLng pos2, DistanceUnit unit)
{
    double R = (unit == DistanceUnit.Miles) ? 3960 : 6371;
    var lat = (pos2.Latitude - pos1.Latitude).ToRadians();
    var lng = (pos2.Longitude - pos1.Longitude).ToRadians();
    var h1 = Math.Sin(lat / 2) * Math.Sin(lat / 2) +
              Math.Cos(pos1.Latitude.ToRadians()) * 
              Math.Cos(pos2.Latitude.ToRadians()) *
              Math.Sin(lng / 2) * Math.Sin(lng / 2);
    var h2 = 2 * Math.Asin(Math.Min(1, Math.Sqrt(h1)));
    return R * h2;
}

public enum DistanceUnit { Miles, Kilometers };

and

var Offices = GetMyOfficeList();
for(int i = 0; i< Offices.Count; i++)
{
    Offices[i].Distance = HaversineDistance(
                        coord,
                        new LatLng(Offices[i].Lat, Offices[i].Lng),
                        DistanceUnit.Miles);
}

var closestOffice = Offices.OrderBy(x => x.Distance).Take(1).Single();

Scalability is important as my scenario could easily end up with a lot more than 100,000 office locations, so an in-memory office list option is unlikely!

Upvotes: 3

Views: 483

Answers (3)

Scott Chamberlain
Scott Chamberlain

Reputation: 127593

If you are using Sql2008 or newer it has specific types built in to make what you want to do easier. The main type you will need to use is geography

I am going to make some guesses at your table structure, but the main thing is you have a Location and a DeleveryArea

create table Offices
(
  OfficeName varchar(40),
  Location geography,
  DeliveryDistance float, --stored in miles
  --If you are on SQL2008 or 2008R2 replace BufferWithCurves with one of the older Buffer functions
  DeliveryArea as Location.BufferWithCurves(DeliveryDistance * 1609.34) PERSISTED, --1609.34 converts miles to meters 
)

I used BufferWithCurves in my above example, but that is only available on Sql2012 and newer, if you are on 2008 or 2008R2 you will need to use either BufferWithTolerance or STBuffer or just manually define your own area by hand in the insert statment.

Now to populate the data, because we made DeliveryArea a calculated persisted column it is actually pretty easy to do. All you need to do is put in the location of the office and the radius of its delivery area and it will calculate the circle for the area for you. I will use the examples you provided in your question:

insert into Offices (OfficeName, Location, DeliveryDistance) 
values ('Boise, ID', 
        geography::Point(43.6187102,-116.2146068, 4326), --4326 represents a "lat and long" coordinate system
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('LA, CA', 
        geography::Point(34.0204989,-118.4117325, 4326),
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('Walla Walla, WI', 
        geography::Point(46.0627549,-118.3337259, 4326),
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('Baker City, OR', 
        geography::Point(44.7746169,-117.8317284, 4326),
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('Elko, NV', 
        geography::Point(40.846931,-115.7669825, 4326),
        300
       )

Now for your query, if you wanted to find which delivery areas serviced Jordan Vally, Oregon (42.9740245,-117.0533247) your query would simply be

declare @cust geography = geography::Point(42.9740245,-117.0533247, 4326)

SELECT OfficeName, 
       Location.STDistance(@cust) / 1609.34 AS Distance, --convert back to miles
       Location.Lat as Lat,
       Location.Long as Lng
FROM Offices
where  DeliveryArea.STContains(@cust) = 1
ORDER BY Distance asc

And this would give you all Offices that the location you chose is within the delivery area. The real nice thing about this system is if instead of calculating DeleveryArea based of the location and delevery range you could actually give it a set of points to outline a non circular geographic area, like a city.

With a well planned Spatial Index this solution should work for even your 100,000 location record set. If you want to learn more about some of the benefits of using geography see this SO question and answer.

Here is a SQL Fiddle with all of the queries I put above shown in a working example.

Upvotes: 3

PJ7
PJ7

Reputation: 787

For a @Raduis in miles:

DECLARE @RadiusInDegrees decimal(18,15)
SET @RadiusInDegrees = @Radius / 69.11 * 1.4142

@Radius / 69.11 gives radius in degrees, but it needs to be multiplied by root 2 to cover the entire square bounding area.

SELECT
    @LatitudeLow = @Latitude - @RadiusInDegrees,
    @LatitudeHigh = @Latitude + @RadiusInDegrees,       
    @LongitudeLow = @Longitude - @RadiusInDegrees,
    @LongitudeHigh = @Longitude + @RadiusInDegrees

Using a bounding area limits the number of times expensive exact distance calculations are made.

[...]
WHERE
    [Latitude] > @LatitudeLow  
    AND [Latitude] < @LatitudeHigh 
    AND [Longitude] > @LongitudeLow
    AND [Longitude] < @LongitudeHigh                
    AND [your exact radius calculation] <= 
      (CASE WHEN [DeliverlyLimit] < @Radius THEN 
          [DeliveryLimit] ELSE @Radius END)

Also for better performance, don't use SELECT * or sub-queries if you can avoid it. Return only IDs, and make a covering index for ID, Latitude, Longitude, and DeliveryLimit. Cache all the data from your table into an array of objects where the index is the ID so matching from the result set is quick.

Upvotes: 1

Jason Fry
Jason Fry

Reputation: 1234

You already have the distance between the office and the target location. Return only those offices whose distance is less than or equal to their delivery radius:

SELECT * FROM (SELECT Id, DeliveryRadius, ( 3960 * acos( cos( radians( @custLat ) ) *
    cos( radians( Lat ) ) * cos( radians(  Lng  ) - radians( @custLng ) ) +
    sin( radians( @custLat ) ) * sin( radians(  Lat  ) ) ) ) AS Distance
    FROM Offices)
WHERE Distance <= DeliveryRadius
ORDER BY Distance ASC

Your second question asks how offices can be partitioned so that a search considers a subset of offices; I would first ensure that your search is not performant enough given your dataset and latency requirements before considering doing this.

Upvotes: 1

Related Questions