Reputation: 1105
I have a table containing 42000+ zipcodes, longitude, latitude, and state information. Whats the most accurate and fastest query to return a result with all zipcodes with a 25 mile radius of the zipcode entered?
Current Code (I don't think its accurate)
SELECT
zipcode, (
3959 * acos (
cos ( radians(78.3232) )
* cos( radians( latitude ) )
* cos( radians( longitude ) - radians(65.3234) )
+ sin ( radians(78.3232) )
* sin( radians( latitude ) )
)
) AS distance
FROM Location
HAVING distance < 25
ORDER BY distance
Upvotes: 2
Views: 2490
Reputation: 16325
On Accuracy
The only way to calculate distance accurately is with 3D trig, as you're doing. You can read more on that topic here: https://en.wikipedia.org/wiki/Geographical_distance
Although giving a pretty accurate distance between the lat/lng center-points of zipcodes, those center-points are arbitrarily picked, and the distance is calculated "as the crow flies", so you won't get an accurate representation of actual travel distance between two points within each.
For example, you may have two homes next-door to each other in adjacent zipcodes, or two homes on opposite ends of each zipcode, which will calculate as equidistant given this calculation.
The only way to correct that issue is to calculate address distance, which requires USPS data to map an address to a more specific point, or the use of an API like Google Maps, which will also calculate actual travel distance given available roads.
On Performance
There are a couple ways to speed up your query.
1. Reduce the Real-time Math
The fastest way to do your calculations in real-time is to precalculate and store the expensive trig values in columns in your table, e.g.:
ALTER TABLE Location
ADD COLUMN cos_rad_lat DOUBLE,
ADD COLUMN cos_rad_lng DOUBLE,
ADD COLUMN sin_rad_lat DOUBLE;
Then
UPDATE Location
SET cos_rad_lat = cos(radians(latitude)),
cos_rad_lng = cos(radians(longitude)),
sin_rad_lat = sin(radians(latitude));
Do your cos(radians(78.3232)) type calculations outside the query, so that math isn't done for each row of data.
Thus reducing all calculations to constant values (before getting to SQL) and calculated columns will make your query look like this:
SELECT
zipcode,
3959 * acos(
0.20239077538110228
* cos_rad_lat
* cos_rad_lng - 1.140108408597264
)
+ 0.979304842243025 * sin_rad_lat AS distance
FROM Location
HAVING distance < 25
ORDER BY distance
2. Bounding-box Reduction
Note: You can combine this with method 1.
You could probably increase performance slightly by adding a bounding-box reduction of zips in a subquery before doing the trig, but that may be more complicated than you would like.
For example, instead of:
FROM Location
You could do
FROM (
SELECT *
FROM Location
WHERE latitude BETWEEN A and B
AND longitude BETWEEN C and D
) AS Location
Where A, B, C, and D are numbers corresponding to your center-point +- about 0.3 (As each 10th of a degree of lat/lng corresponds to about 5-7 miles in the US).
This method gets tricky at -180 / 180 Longitude, but that doesn't affect the US.
3. Store All Calculated Distances Another thing you could do is precalculate all distances of all zips, and store then in a separate table
CREATE TABLE LocationDistance (
zipcode1 varchar(5) NOT NULL REFERENCES Location(zipcode),
zipcode2 varchar(5) NOT NULL REFERENCES Location(zipcode)
distance double NOT NULL,
PRIMARY KEY (zipcode1, zipcode2),
INDEX (zipcode1, distance)
);
Populate this table with every combination of zip and their calculated distance.
Your query would then look like:
SELECT zipcode2
FROM LocationDistance
WHERE zipcode1 = 12345
AND distance < 25;
This would by far be the fastest solution, though it involves storing on the order of 1 Billion records.
Upvotes: 2
Reputation: 6449
This may or may not be the fastest, but you can do it by first precomputing the normal vectors (NV) for each coordinate pair and expressing the vector in terms of it's X, Y and Z components:
NV = [Nx, Ny, Nz]
where
Nx = cos(radians(latitude))*cos(radians(longitude))
Ny = cos(radians(latitude))*sin(radians(longitude))
Nz = sin(radians(latitude))
Then the distance between any two coordinates can be computed by determining the difference of the two normal vectors NV1 and NV2 and using Pythagoras equation in three dimensions to get the straight line distance between the two points, that is the chord length C:
C = SQRT(dx^2+dy^2+dz^2)
where
dx = Nx1-Nx2
dy = Ny1-Ny2
dz = Nz1-Nz2
Then the great circle distance can be found with the following formula:
D = arcsin(C/2)*2*R
Where R it the radius of the sphere in this case the earth, which is 3959mi.
Putting it all together:
select pt2.zip
, asin(power(power(pt1.nx-pt2.nx,2)
+power(pt1.ny-pt2.ny,2)
+power(pt1.nz-pt2.nz,2)
,.5)/2)*2*3959 distance
from (select 78.3232 lattitude
, 65.3234 longitude
, cos(radians(78.3232))*cos(radians(65.3234)) nx
, cos(radians(78.3232))*sin(radians(65.3234)) ny
, sin(radians(78.3232)) nz
) pt1
, (select zip
, lattitude
, longitude
, cos(radians(latitude))*cos(radians(longitude)) nx
, cos(radians(latitude))*sin(radians(longitude)) ny
, sin(radians(latitude)) nz
from location) pt2
having distance < 25;
To further optimize this you can calculate a some bounds on the coordinates. Every degree of latitude is roughly equal to 69 miles so you can limit your search to those latitudes ±(D/69). The number of miles per degree of longitude, however, varies with the latitude ranging from about 69 miles per degree at the equator to zero at the poles or 69*cos(latitude), you use ±(D/69*cos(latitude)).
where pt2.latitude between pt1.latitude - 25/69
and pt1.latitude + 25/69
and pt2.longitude between pt1.longitude - 25/(69*cos(radians(abs(pt1.latitude)+25/69)))
and pt1.longitude + 25/(69*cos(radians(abs(pt1.latitude)+25/69)))
Upvotes: 0
Reputation: 48197
Looks like you already know how to calculate distance using Latitud, Logitud
The fastest way is you create the surronding box close to your ZIPCODE
| X-25, Y-25 | | X+25, Y-25 |
X , Y
| X-25, Y+25 | | X+25, Y+25 |
So create 4 variables
Xleft = X - 25miles
Xright = X + 25miles
Ytop = Y - 25miles
Ybottom = Y + 25miles
Then if latitud and longitud have index this query is almost instant
SELECT *
FROM
Location
WHERE
latitud between Xleft AND Xright
AND longitud between Ytop AND Ybottom
You will get some error using a square but you will filter most of the wrong zipcodes. Then you could your original query with a much smaller data set.
Upvotes: 0