ah bon
ah bon

Reputation: 10051

Find nearest location by latitude and longitude and fill up District and BusinessArea in Python

I have several points of interest from dataframe 1 as follows:

City    District    BusinessArea      LATB         LNGB
BJ                                  39.891239   116.333334
BJ                                  39.893203   116.365832
BJ                                  39.936265   116.359406
BJ                                  39.88723    116.399005
BJ                                  39.882956   116.35425

I want to search nearest location's District and BusinessArea from dataframe 2 to these places and fill them in corresponding columns in dataframe 1. How can I realize this in Python? Thanks.

City    District    BusinessArea    LATB       LNGB
 BJ        CY            CBD     39.958953  116.521695
 BJ        DC            ADM     39.959331  116.417026
 BJ        HD            ANZ     40.050313  116.328861
 BJ       XAXQ           AX      38.878255  115.886731
 BJ        CY            AZ      39.979982  116.407959
 BJ        CY           ALPKGY   40.00121   116.399127
 BJ        SJS           BBS     39.920912  116.243273
 BJ        YQ            BDL     40.367837  115.983509
 BJ        SJS           BDC     39.955215  116.194778
 BJ        SJS           BJ      39.91896   116.205016

This is by far I've realized. More work need to be done.

import pandas as pd
import numpy as np
import math
from math import *

EARTH_REDIUS = 6378.137

def rad(d):
    return d * pi / 180.0

def getDistance(lat1, lng1, lat2, lng2):
    radLat1 = rad(lat1)
    radLat2 = rad(lat2)
    a = radLat1 - radLat2
    b = rad(lng1) - rad(lng2)
    s = 2 * math.asin(math.sqrt(math.pow(sin(a/2), 2) + cos(radLat1) * cos(radLat2) * math.pow(sin(b/2), 2)))
    s = s * EARTH_REDIUS
    return s

if __name__ == '__main__':

    business = pd.read_excel("df2.xlsx")
    match_place = pd.read_excel("df1.xlsx")
    res = pd.DataFrame()
    for i in range(business.shape[0]):
        for j in range(5):
            res.at[j,business.at[i,"BusinessArea"]] = getDistance(business.at[i,"LATB"],business.at[i,"LNGB"]
                        ,match_place.at[j,"LATB"],match_place.at[j,"LNGB"])
    print(res.columns[np.argmin(np.array(res), 1)])

Upvotes: 2

Views: 201

Answers (1)

keepAlive
keepAlive

Reputation: 6665

What about doing (and using your geodesic getDistance function as it is)

cols_to_get = ('District', 'BusinessArea')

dmat = df1.apply(
    lambda r1: df2.apply(
        lambda r2: getDistance(r1.LATB, r1.LNGB,  r2.LATB, r2.LNGB),
        axis=1
    ),
    axis=1
)


df1.ix[:, cols_to_get] = df2.ix[
    dmat.idxmin(axis=1), cols_to_get
].values

print(df1)
# returns:
#  City District BusinessArea       LATB        LNGB
#0   BJ      SJS          BBS  39.891239  116.333334
#1   BJ       DC          ADM  39.893203  116.365832
#2   BJ       DC          ADM  39.936265  116.359406
#3   BJ       DC          ADM  39.887230  116.399005
#4   BJ       DC          ADM  39.882956  116.354250


Update

Note that using the method ix may henceforth return a deprecation warning, such as

DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

Upvotes: 2

Related Questions