Pallavi Verma
Pallavi Verma

Reputation: 85

How to calculate minimum distance using lat-lon data in python

I have two data frame. One is user id with lat lon data and other is store code with store lat lon data. Around 89M rows are there. I want nearest (based on min.distance) store code corresponding user lat lon.

df1 - 

id          user_lat       user_lon
1           13.031885      80.235574
2           19.099819      72.915288
3           22.226980      84.836070

df2 - 

store_no       s_lat        s_lon
22             29.91         73.88
23             28.57         77.33
24             26.86         80.95

I have done so far -

from geopy.distance import vincenty
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')

df1 = df1[['user_lat','user_lon']]

df2 = df2[['s_lat','s_lon']]

x = pd.merge(df1.assign(k=1), df2.assign(k=1), on='k', suffixes=('1', '2')) \
      .drop('k',1)

x.head(20)

    user_lat    user_lon    s_lat   s_lon
0   13.031885   80.235574   29.91   73.88
1   13.031885   80.235574   28.57   77.33
2   13.031885   80.235574   26.86   80.95
3   19.099819   72.915288   29.91   73.88
4   19.099819   72.915288   28.57   77.33
5   19.099819   72.915288   26.86   80.95
6   22.226980   84.836070   29.91   73.88
7   22.226980   84.836070   28.57   77.33
8   22.226980   84.836070   26.86   80.95

x['dist'] = np.ravel(dist.pairwise(np.radians(store_lat_lon),np.radians(user_lat_lon)) * 6367)

   user_lat     user_lon    s_lat   s_lon    dist
0   13.031885   80.235574   29.91   73.88   1986.237557
1   13.031885   80.235574   28.57   77.33   1205.217610
2   13.031885   80.235574   26.86   80.95   1386.069611
3   19.099819   72.915288   29.91   73.88   1752.628427
4   19.099819   72.915288   28.57   77.33   1143.731258
5   19.099819   72.915288   26.86   80.95   1031.246453
6   22.226980   84.836070   29.91   73.88   1538.449674
7   22.226980   84.836070   28.57   77.33   1190.620278
8   22.226980   84.836070   26.86   80.95   647.477461

But I want data frame looks like -

    user_lat    user_lon    s_lat   s_lon    dist         store_no
0   13.031885   80.235574   29.91   73.88   1986.237557     23
1   13.031885   80.235574   28.57   77.33   1205.217610     23
2   13.031885   80.235574   26.86   80.95   1386.069611     23
3   19.099819   72.915288   29.91   73.88   1752.628427     24
4   19.099819   72.915288   28.57   77.33   1143.731258     24
5   19.099819   72.915288   26.86   80.95   1031.246453     24
6   22.226980   84.836070   29.91   73.88   1538.449674     24
7   22.226980   84.836070   28.57   77.33   1190.620278     24
8   22.226980   84.836070   26.86   80.95   647.477461      24

Upvotes: 0

Views: 806

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

Finding the nearest store of each user is a classic use case for either the k-d tree or ball tree data structures. Scikit-learn implements both, but only the BallTree accepts the haversine distance metric, so we'll use that.

import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree, DistanceMetric

# Set up example data
df1 = pd.DataFrame({'id': [1, 2, 3],
                    'user_lat': [13.031885, 19.099819, 22.22698],
                    'user_lon': [80.235574, 72.915288, 84.83607]})

df2 = pd.DataFrame({'store_no': [22, 23, 24],
                    's_lat': [29.91, 28.57, 26.86],
                    's_lon': [73.88, 77.33, 80.95]})

# Build k-d tree with haversine distance metric, which expects
# (lat, lon) in radians and returns distances in radians
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(np.radians(df2[['s_lat', 's_lon']]), metric=dist)

coords = np.radians(df1[['user_lat', 'user_lon']])
dists, ilocs = tree.query(coords)
# dists is in rad; convert to km
df1['dist'] = dists.flatten() * 6367
df1['nearest_store'] = df2.iloc[ilocs.flatten()]['store_no'].values

# Result:
df1
   id   user_lat   user_lon         dist  nearest_store
0   1  13.031885  80.235574  5061.416309             23
1   2  19.099819  72.915288  8248.857621             24
2   3  22.226980  84.836070  7483.628300             23

Upvotes: 2

Related Questions