Lorenzo Rossi
Lorenzo Rossi

Reputation: 633

Sorting by distance in MySQL with spatial analysis functions and data types

I'm building a php web app with Laravel 5.5 and I need to display a list of places (eg. stores) sorted by their distance from a user-specified location. The places will be stored in a MySQL database and should be retrieved as Eloquent ORM model instances.

Doing some research I found many posts and questions on this topic (presenting different solutions), but, having very little experience with databases and geolocation/geospatial analysis, they mostly confused me, and I'd like to know what approach to follow and what are the best practices in this case.

Most answers I read suggest using the haversine formula or the spherical law of cosines in the SQL query, which would look something like (example taken from this answer):

$sf = 3.14159 / 180; // scaling factor
$sql = "SELECT * FROM table 
    WHERE lon BETWEEN '$minLon' AND '$maxLon' 
      AND lat BETWEEN '$minLat' AND '$maxLat'
    ORDER BY ACOS(SIN(lat*$sf)*SIN($lat*$sf) + COS(lat*$sf)*COS($lat*$sf)*COS((lon-$lon)*$sf))";

This post points out the fact that, over short distances, assuming the Earth flat and computing a simple euclidean distance is a good approximation and is faster than using the haversine formula.
Since I only need to sort places within a single city at a time, this seems to be a good solution.

However, most of these posts and SO answers are a few years old and I was wondering if there is now (MySQL 5.7) a better solution.

For example, none of those post use any of MySQL “Spatial Analysis Functions”, like ST_Distance_Sphere and ST_Distance which seem to be exactly for that purpose.
Is there any reason (eg. performance, precision) not to use these functions instead of writing the formula in the query? (I don't know which algorithm is internally used for these functions)

I also don't know how I should store the coordinates of each place. Most of the examples I've seen assume the coordinates to be stored in separate lat, lon columns as doubles or as FLOAT(10,6) (as in this example by google), but also MySQL POINT data type seems appropriate for storing geographic coordinates.
What are the pros and cons of these two approaches?

How can indexes be used to speed up these kind of queries? For example I've read about “spatial indexes”, but I think they can only be used for limiting the results with something like MBRContains(), not to actually order the results by distance.

So, how should I store the coordinates of places and how should I query them to be ordered by distance?

Upvotes: 7

Views: 3200

Answers (3)

HopeKing
HopeKing

Reputation: 3503

Use ST_DISTANCE_SPHERE or MBRContains to get distance between points or points within a bound - much faster than doing Haversine formula which can't use indices and is not built for querying distances and because MySql is slow with range queries. Refer mysql documentation.

Haversine formula is probably good for small applications and most of the older answer refer to that solution because older versions of MySql innodb did not have spatial indexes.

The broad method of doing it is as follows - the below is from my working code in Java - hope you can tailor it for PHP as per your needs

  1. First save the incoming data as a Point in database (Do note that the coordinate formula uses longitude, latitude convention)

        GeometryFactory factory = new GeometryFactory();
        Point point = factory.createPoint(new Coordinate(officeDto.getLongitude(), officeDto.getLatitude()));//IMP:Longitude,Latitude
        officeDb.setLocation(point);
    
  2. Create Spatial Indexes using the following in mysql

    CREATE SPATIAL INDEX location ON office (location);

You might get the error "All parts of a SPATIAL index must be NOT NULL". That is because spatial indexes can only be created if the field is NOT NULL - in such a case convert the field to non-null

  1. Finally, call the custom function ST_DISTANCE_SPHERE from your code as follows.

    SELECT st_distance_sphere( office.getLocation ,  project.getLocation) 
     as distance FROM ....
    

Note: office.getLocation and project.getLocation both return POINT types. Native SQL method is as below from documentation

ST_Distance_Sphere(g1, g2 [, radius]) 

which returns the mimimum spherical distance between two points and/or multipoints on a sphere, in meters, or NULL if any geometry argument is NULL or empty.

Upvotes: 1

Joseph_J
Joseph_J

Reputation: 3669

I use a table that has lat & long associate with zip codes that I found. I use the haversine formula to find all zipcodes within a certain range. I then use that list of zip codes that are returned from that query and find all business with those zip codes. Maybe that solution will work for you. It was pretty easy to implement. This also eliminates you having to know the lat and long for the each business as long as you know the zip code.

Upvotes: 1

Rick James
Rick James

Reputation: 142356

Other than the ST_Distance_Sphere, 5.7 does not bring anything extra to the table. (SPATIAL was already implemented.)

For 'thousands' of points, the code you have is probably the best. Include

INDEX(lat, lng),
INDEX(lng, lat)

And I would not worry about the curvature of the earth unless you are stretching thousands of miles (kms). Even then the code and that function should be good enough.

Do not use FLOAT(m,n), use only FLOAT. The link below gives the precision available to FLOAT and other representations.

If you have so many points that you can't cache the table and its indexes entirely (many millions of points), you could use this , which uses a couple of tricks to avoid lengthy scans like the above solution. Because of PARTITION limitations, lat/lng are represented as scaled integers. (But that is easy enough to convert in the input/output.) The earth's curvature, poles, and dateline are all handled.

Upvotes: 5

Related Questions