Reputation: 2772
We recently inherited a PHP-based website codebase, and there's an SQL query which makes the mysqld process unresponsive for ~50 seconds while it runs (and takes up 100% CPU on top
). It involves determining which locations are within a given radius by comparing their zipcode's geo-coordinates. Frankly, i can't make heads or tails of why it's so heavy. I thought it might be the heavy use of trig and the sqrt(), but using a different formula had almost no effect. (As a bonus, it didn't even work.)
The salon_locations
table has ~45k entries, but as far as I can tell, none of the other queries take this long. In fact, the name-based search (as opposed to the geographic proximity search above) is practically instantaneous over the same data-set. I'm not very familiar with SQL, so could someone help me figure out what's causing the bottleneck?
I should note that, before the codebase was given to us, it was running perfectly well at its previous home.
SELECT SQL_CALC_FOUND_ROWS
salon.*,
salon_locations.*,
salon_package.logo,
salon_package.searchorder,
salon_package.choice,
salon_packages.siteid,
(SELECT id FROM salon_coupons WHERE salonid = salon.id AND siteid = $siteid AND active = 1 LIMIT 1) AS saloncoupon,
(SELECT AVG(rating) FROM salon_reviews WHERE salonid = salon.id AND siteid = salon_packages.siteid AND approved = 1 GROUP BY salonid,siteid) AS rating,
3956 * 2 * atan2(sqrt(pow((sin(0.0174 * (salon_locations.latitude - $latitude)/2)),2) + cos(0.0174 * $latitude) * cos(0.0174 * salon_locations.latitude) * pow((sin(0.0174 * (salon_locations.longitude - $longitude) / 2)),2)),
sqrt(1 - (pow((sin(0.0174 * (salon_locations.latitude - $latitude) / 2)),2) + cos(0.0174 * $latitude) * cos(0.0174 * salon_locations.latitude) * pow((sin(0.0174 * (salon_locations.longitude - $longitude)/2)),2)))) AS geoCodeDistance
FROM salon_locations
INNER JOIN salon
ON salon_locations.salonid = salon.id
INNER JOIN salon_packages
ON salon_locations.salonid = salon_packages.salonid
INNER JOIN salon_package
ON salon_packages.packageid = salon_package.id
WHERE salon.active = 1
AND salon_locations.latitude != ''
AND salon_locations.longitude != ''
GROUP BY salon.id
HAVING geoCodeDistance <= $radius
ORDER BY salon.salonorder,salon_package.searchorder ASC,geoCodeDistance ASC,RAND()
LIMIT $start,$end;
Upvotes: 2
Views: 853
Reputation: 212412
Use PHP to add a "bounding box" to your SQL query, based on the given radius. See my response to this question for a description of how this works.
EDIT
Basically you precalculate a max and min longitude and latitude based on your radius, then add this into your SQL query
AND salon_locations.latitude != ''
AND salon_locations.latitude BETWEEN $minLatitude and $maxLatitude
AND salon_locations.longitude != ''
AND salon_locations.longitude BETWEEN $minLongitude and $maxLongitude).
This restricts the SQL select to a subset of the salons; and your distance calculation is only performed for that subset rather than the large set that you're currently calculating.
Upvotes: 3
Reputation: 425391
If your table is MyISAM
I would recommend to store the coordinates using Point
datatype, create a spatial index over it and use it in a query:
SELECT *,
FROM salon_locations sl
JOIN …
WHERE MBRContains
(
LineString
(
Point($northing - $radius, $easting - $radius),
Point($northing + $radius, $easting + $radius)
),
sl.location
)
Note that it is better to use UTM
coordinates (metrical easting and northing) instead of lat
and lon
to simplify the calculations. Unfortunately, yuo can use them only within one zone, since MySQL
does not let creating mixing equality and spatial indexes, however, if all you objects are within one hemisphere and are not very close to the poles, you can use your own false eastings and northings which will give you results good enough for small radii (less than 500 km or like this).
Upvotes: 1
Reputation: 107716
The best way to attack these queries is to pre-filter the lat-lon range using the maximum distance per-lat/long * distance (radius) sought. It gives a bounding box where the corners are 40% too far, but is a quick pre-filter that is easy to apply on the lat-lon indices without fully calculating the distance of every single point from the origin.
Upvotes: 1