Twisol
Twisol

Reputation: 2772

SQL geographic proximity query takes almost a minute to execute

We recently inherited a PHP-based website codebase, and there's an SQL query which makes the mysqld process unresponsive for ~50 seconds while it runs (and takes up 100% CPU on top). It involves determining which locations are within a given radius by comparing their zipcode's geo-coordinates. Frankly, i can't make heads or tails of why it's so heavy. I thought it might be the heavy use of trig and the sqrt(), but using a different formula had almost no effect. (As a bonus, it didn't even work.)

The salon_locations table has ~45k entries, but as far as I can tell, none of the other queries take this long. In fact, the name-based search (as opposed to the geographic proximity search above) is practically instantaneous over the same data-set. I'm not very familiar with SQL, so could someone help me figure out what's causing the bottleneck?

I should note that, before the codebase was given to us, it was running perfectly well at its previous home.

SELECT SQL_CALC_FOUND_ROWS
  salon.*,
  salon_locations.*,
  salon_package.logo,
  salon_package.searchorder,
  salon_package.choice,
  salon_packages.siteid,
  (SELECT id FROM salon_coupons WHERE salonid = salon.id AND siteid = $siteid AND active = 1 LIMIT 1) AS saloncoupon,
  (SELECT AVG(rating) FROM salon_reviews WHERE salonid = salon.id AND siteid = salon_packages.siteid AND approved = 1 GROUP BY salonid,siteid) AS rating,
  3956 * 2 * atan2(sqrt(pow((sin(0.0174 * (salon_locations.latitude - $latitude)/2)),2) + cos(0.0174 * $latitude) * cos(0.0174  * salon_locations.latitude) * pow((sin(0.0174 * (salon_locations.longitude - $longitude) / 2)),2)),
  sqrt(1 - (pow((sin(0.0174  * (salon_locations.latitude - $latitude) / 2)),2) + cos(0.0174  * $latitude) * cos(0.0174  * salon_locations.latitude) * pow((sin(0.0174  * (salon_locations.longitude - $longitude)/2)),2)))) AS geoCodeDistance
  FROM salon_locations
  INNER JOIN salon
  ON salon_locations.salonid = salon.id
  INNER JOIN salon_packages
  ON salon_locations.salonid = salon_packages.salonid
  INNER JOIN salon_package
  ON salon_packages.packageid = salon_package.id
  WHERE salon.active = 1
  AND salon_locations.latitude != ''
  AND salon_locations.longitude != ''
  GROUP BY salon.id
  HAVING geoCodeDistance <= $radius
  ORDER BY salon.salonorder,salon_package.searchorder ASC,geoCodeDistance ASC,RAND()
  LIMIT $start,$end;

Upvotes: 2

Views: 853

Answers (3)

Mark Baker
Mark Baker

Reputation: 212412

Use PHP to add a "bounding box" to your SQL query, based on the given radius. See my response to this question for a description of how this works.

EDIT

Basically you precalculate a max and min longitude and latitude based on your radius, then add this into your SQL query

AND salon_locations.latitude != '' 
AND salon_locations.latitude BETWEEN $minLatitude and $maxLatitude
AND salon_locations.longitude != '' 
AND salon_locations.longitude BETWEEN $minLongitude and $maxLongitude).

This restricts the SQL select to a subset of the salons; and your distance calculation is only performed for that subset rather than the large set that you're currently calculating.

Upvotes: 3

Quassnoi
Quassnoi

Reputation: 425391

If your table is MyISAM I would recommend to store the coordinates using Point datatype, create a spatial index over it and use it in a query:

SELECT  *, 
FROM    salon_locations sl
JOIN    …
WHERE   MBRContains
                (
                LineString
                        (
                        Point($northing - $radius, $easting - $radius),
                        Point($northing + $radius, $easting + $radius)
                        ),
                sl.location
                )

Note that it is better to use UTM coordinates (metrical easting and northing) instead of lat and lon to simplify the calculations. Unfortunately, yuo can use them only within one zone, since MySQL does not let creating mixing equality and spatial indexes, however, if all you objects are within one hemisphere and are not very close to the poles, you can use your own false eastings and northings which will give you results good enough for small radii (less than 500 km or like this).

Upvotes: 1

RichardTheKiwi
RichardTheKiwi

Reputation: 107716

The best way to attack these queries is to pre-filter the lat-lon range using the maximum distance per-lat/long * distance (radius) sought. It gives a bounding box where the corners are 40% too far, but is a quick pre-filter that is easy to apply on the lat-lon indices without fully calculating the distance of every single point from the origin.

Upvotes: 1

Related Questions