Reputation: 10964
I have two tables t1
and t2
(t1
has 1/10th of the size of t2
). Each table has two columns <Lat, Long>
that contain the latitude and longitude of some points. For each row in t1
I'd like to find the row in t2
that is the closest to it. What would be the most efficient query for doing this? Does Hive have any sort of libraries for geospatial search?
Upvotes: 2
Views: 981
Reputation: 798
You'll need to do a bit of trig.
Please refer to this article on Database Journal
The last routine I believe is what you are looking for (you'll need to modify it for your use):
CREATE DEFINER=`root`@`localhost` PROCEDURE closest_restaurants_optimized`
(IN units varchar(5), IN lat Decimal(9,6), IN lon Decimal(9,6),
IN max_distance SMALLINT, IN limit_rows MEDIUMINT)
BEGIN
DECLARE ONE_DEGREE_CONSTANT TINYINT;
DECLARE EARTH_RADIUS_CONSTANT SMALLINT;
DECLARE lon1, lon2, lat1, lat2 float;
IF units = 'miles' THEN
SET ONE_DEGREE_CONSTANT = 69;
SET EARTH_RADIUS_CONSTANT = 3959;
ELSE -- default to kilometers
SET ONE_DEGREE_CONSTANT = 111;
SET EARTH_RADIUS_CONSTANT = 6371;
END IF;
SET lon1 = lon-max_distance/abs(cos(radians(lat))*ONE_DEGREE_CONSTANT);
SET lon2 = lon+max_distance/abs(cos(radians(lat))*ONE_DEGREE_CONSTANT);
SET lat1 = lat-(max_distance/ONE_DEGREE_CONSTANT);
SET lat2 = lat+(max_distance/ONE_DEGREE_CONSTANT);
SELECT pm1.post_id, p.post_title,
ROUND((EARTH_RADIUS_CONSTANT * acos( cos( radians(lat) )
* cos( radians(pm1.meta_value) )
* cos( radians(pm2.meta_value) - radians(lon)) + sin(radians(lat))
* sin( radians(pm1.meta_value)))
), 3) AS distance
FROM goodfood_wp_md20m_postmeta AS pm1,
goodfood_wp_md20m_postmeta AS pm2,
goodfood_wp_md20m_posts AS p
WHERE pm1.meta_key = 'latitude' AND pm2.meta_key = 'longitude'
AND pm1.post_id = pm2.post_id
AND pm1.post_id = p.id
AND p.post_status = 'publish'
AND pm2.meta_value between lon1 and lon2
AND pm1.meta_value between lat1 and lat2
ORDER BY distance ASC
LIMIT limit_rows;
END
Upvotes: 2