Reputation: 10051
I have several points of interest from dataframe 1 as follows:
City District BusinessArea LATB LNGB
BJ 39.891239 116.333334
BJ 39.893203 116.365832
BJ 39.936265 116.359406
BJ 39.88723 116.399005
BJ 39.882956 116.35425
I want to search nearest location's District and BusinessArea from dataframe 2 to these places and fill them in corresponding columns in dataframe 1. How can I realize this in Python? Thanks.
City District BusinessArea LATB LNGB
BJ CY CBD 39.958953 116.521695
BJ DC ADM 39.959331 116.417026
BJ HD ANZ 40.050313 116.328861
BJ XAXQ AX 38.878255 115.886731
BJ CY AZ 39.979982 116.407959
BJ CY ALPKGY 40.00121 116.399127
BJ SJS BBS 39.920912 116.243273
BJ YQ BDL 40.367837 115.983509
BJ SJS BDC 39.955215 116.194778
BJ SJS BJ 39.91896 116.205016
This is by far I've realized. More work need to be done.
import pandas as pd
import numpy as np
import math
from math import *
EARTH_REDIUS = 6378.137
def rad(d):
return d * pi / 180.0
def getDistance(lat1, lng1, lat2, lng2):
radLat1 = rad(lat1)
radLat2 = rad(lat2)
a = radLat1 - radLat2
b = rad(lng1) - rad(lng2)
s = 2 * math.asin(math.sqrt(math.pow(sin(a/2), 2) + cos(radLat1) * cos(radLat2) * math.pow(sin(b/2), 2)))
s = s * EARTH_REDIUS
return s
if __name__ == '__main__':
business = pd.read_excel("df2.xlsx")
match_place = pd.read_excel("df1.xlsx")
res = pd.DataFrame()
for i in range(business.shape[0]):
for j in range(5):
res.at[j,business.at[i,"BusinessArea"]] = getDistance(business.at[i,"LATB"],business.at[i,"LNGB"]
,match_place.at[j,"LATB"],match_place.at[j,"LNGB"])
print(res.columns[np.argmin(np.array(res), 1)])
Upvotes: 2
Views: 201
Reputation: 6665
What about doing (and using your geodesic getDistance
function as it is)
cols_to_get = ('District', 'BusinessArea')
dmat = df1.apply(
lambda r1: df2.apply(
lambda r2: getDistance(r1.LATB, r1.LNGB, r2.LATB, r2.LNGB),
axis=1
),
axis=1
)
df1.ix[:, cols_to_get] = df2.ix[
dmat.idxmin(axis=1), cols_to_get
].values
print(df1)
# returns:
# City District BusinessArea LATB LNGB
#0 BJ SJS BBS 39.891239 116.333334
#1 BJ DC ADM 39.893203 116.365832
#2 BJ DC ADM 39.936265 116.359406
#3 BJ DC ADM 39.887230 116.399005
#4 BJ DC ADM 39.882956 116.354250
Note that using the method ix
may henceforth return a deprecation warning, such as
DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
Upvotes: 2