Mark
Mark

Reputation: 10964

Finding the closest location to a lat and long value

I have two tables t1 and t2 (t1 has 1/10th of the size of t2). Each table has two columns <Lat, Long> that contain the latitude and longitude of some points. For each row in t1 I'd like to find the row in t2 that is the closest to it. What would be the most efficient query for doing this? Does Hive have any sort of libraries for geospatial search?

Upvotes: 2

Views: 981

Answers (1)

Wranorn
Wranorn

Reputation: 798

You'll need to do a bit of trig.

Please refer to this article on Database Journal

The last routine I believe is what you are looking for (you'll need to modify it for your use):

CREATE DEFINER=`root`@`localhost` PROCEDURE closest_restaurants_optimized`
(IN units varchar(5), IN lat Decimal(9,6), IN lon Decimal(9,6), 
IN max_distance SMALLINT, IN limit_rows MEDIUMINT)
BEGIN
    DECLARE ONE_DEGREE_CONSTANT TINYINT;
    DECLARE EARTH_RADIUS_CONSTANT SMALLINT;
    DECLARE lon1, lon2, lat1, lat2 float;  
    IF units = 'miles' THEN
        SET ONE_DEGREE_CONSTANT   = 69;
        SET EARTH_RADIUS_CONSTANT = 3959;
    ELSE -- default to kilometers
        SET ONE_DEGREE_CONSTANT   = 111;
        SET EARTH_RADIUS_CONSTANT = 6371;
    END IF;  
    SET lon1 = lon-max_distance/abs(cos(radians(lat))*ONE_DEGREE_CONSTANT);
    SET lon2 = lon+max_distance/abs(cos(radians(lat))*ONE_DEGREE_CONSTANT);
    SET lat1 = lat-(max_distance/ONE_DEGREE_CONSTANT);
    SET lat2 = lat+(max_distance/ONE_DEGREE_CONSTANT);
    SELECT pm1.post_id, p.post_title,
        ROUND((EARTH_RADIUS_CONSTANT * acos( cos( radians(lat) ) 
            * cos( radians(pm1.meta_value) ) 
            * cos( radians(pm2.meta_value) - radians(lon)) + sin(radians(lat)) 
            * sin( radians(pm1.meta_value)))
        ), 3) AS distance
    FROM goodfood_wp_md20m_postmeta AS pm1, 
        goodfood_wp_md20m_postmeta AS pm2,
        goodfood_wp_md20m_posts    AS p 
    WHERE pm1.meta_key = 'latitude' AND pm2.meta_key = 'longitude'
    AND pm1.post_id = pm2.post_id
    AND pm1.post_id = p.id
    AND p.post_status = 'publish'
    AND pm2.meta_value between lon1 and lon2 
    AND pm1.meta_value between lat1 and lat2
    ORDER BY distance ASC
    LIMIT limit_rows;
END

Upvotes: 2

Related Questions