Reputation: 17
I have two different datasets (samples below):
I am trying to find the minimum distance between each site in dataset 1 and dataset 2. So each location in dataset 1 would have a column showing the distance from closest site that exists in dataset 2.
So far I have this, but I can't get it work. Any advice how to proceed is appreciated.
from geopy import distance
import pandas as pd
s = {
'site_id': dataset1['site_id'],
'latitude' : dataset1['latitude'],
'longitude' : dataset1['longitude']
}
d = {
'site_id': dataset2['site_id'],
'latitude' : dataset2['latitude'],
'longitude' : dataset2['longitude']
}
#s = pd.DataFrame(s)
#d = pd.DataFrame(d)
for (ss, a) in s.items():
best = None
dist = None
for (dd, b) in d.items():
km = distance.distance(a, b).km
if dist is None or km < dist:
best = dd
dist = km
print(f'{ss} is nearest {best}: {dist} km')
Upvotes: 0
Views: 332
Reputation: 6639
It looks like there is another column in the dataframe called 'site_id'
since you are reading this into your s
and d
variables
s = {
'site_id': dataset1['site_id'],
'latitude' : dataset1['latitude'],
'longitude' : dataset1['longitude']
}
So it would seem that you would be comparing site_id
in the formula
km = distance.distance(a, b).km
Also a
and b
need to be tuples of the lat/long which doesn't seem likely to be the case as extracted from the s
and d
series.
Are your dataframes more like this?
DF1 DF2
| site_id | latitude | longitude | | site_id | latitude | longitude |
|---------|----------|-----------| |---------|----------|-----------|
| Site1 |51.8236 | -3.019610 | | SiteA | 51.8313 | -2.27422 |
| Site2 |52.4157 | -4.083580 | | SiteB | 50.4891 | -3.55259 |
| Site3 |57.1478 | -2.098000 | | SiteC | 56.5792 | -3.34735 |
| Site4 |56.4617 | -2.991410 | | SiteD | 57.1492 | -2.08277 |
| Site5 |51.2490 | -0.764848 | | SiteE | 57.2875 | -2.37346 |
| Site6 |57.1438 | -2.109280 | | SiteF | 57.1558 | -2.11278 |
| Site7 |51.6707 | -1.282660 | | SiteG | 57.1967 | -2.09314 |
| SiteH | 57.1538 | -2.27820 |
| SiteI | 53.7527 | -2.36054 |
| SiteJ | 55.8659 | -3.97845 |
If so you want to create two dictionaries of tuples where the site_id
is the key and the tuple of lat/long is the value as in s_dict
and d_dict
below, example;
s_dict = {
'Site1': (51.8236, -3.01961),
'Site2': (52.4157, -4.08358),
'Site3': (57.1478, -2.098),
'Site4': (56.4617, -2.99141),
...
}
You can then extract the source lat/long tuple for each site and compare to the destination tuple and get the best distance.
from geopy import distance
import pandas as pd
# Dataframes...dataset1 and dataset2 sourced
### Create dictionary of tuples based on the example dataframes shown above
s_dict = {x[0]: (x[1:]) for x in dataset1.itertuples(index=False)}
d_dict = {x[0]: (x[1:]) for x in dataset2.itertuples(index=False)}
for s_site, s in s_dict.items():
print(f'Checking site: {s_site} Co-Ords: {s}')
best = None
dist = None
for d_site, d in d_dict.items():
### s and d are tuples (lat/long co-ords)
km = distance.distance(s, d).km
print(f'Comparing {s_site} to {d_site}, Co-ords: {d}, Distance: {km}')
if dist is None or km < dist:
best = d_site
dist = km
print(f'{s_site}: The nearest site is {best}: {dist} km')
This should give an output like below with the added print line for each comparison;
Checking site: Site1 Co-Ords: (51.8236, -3.01961)
Comparing Site1 to SiteA, Co-ords: (51.8313, -2.27422), Distance: 51.39541157179988
Comparing Site1 to SiteB, Co-ords: (50.4891, -3.55259), Distance: 153.07461731514346
Comparing Site1 to SiteC, Co-ords: (56.5792, -3.34735), Distance: 529.7691869147437
Comparing Site1 to SiteD, Co-ords: (57.1492, -2.08277), Distance: 595.8983925872216
Comparing Site1 to SiteE, Co-ords: (57.2875, -2.37346), Distance: 609.6418219993236
Comparing Site1 to SiteF, Co-ords: (57.1558, -2.11278), Distance: 596.4352710313524
Comparing Site1 to SiteG, Co-ords: (57.1967, -2.09314), Distance: 601.0900671080333
Comparing Site1 to SiteH, Co-ords: (57.1538, -2.2782), Distance: 595.2575358222653
Comparing Site1 to SiteI, Co-ords: (53.7527, -2.36054), Distance: 219.22839170745868
Comparing Site1 to SiteJ, Co-ords: (55.8659, -3.97845), Distance: 454.30856091686644
Site1: The nearest site is SiteA: 51.39541157179988 km
Upvotes: 1
Reputation: 15
You can use the Haversine formula to calculate the distance between two points given their latitude and longitude coordinates. Here's an example of how you can modify your code to use the Haversine formula:
from math import radians, sin, cos, sqrt, atan2
def haversine(lat1, lon1, lat2, lon2):
# convert decimal degrees to radians
lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
# haversine formula
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
km = 6371 * c
return km
Update your code with the following
for (ss, a) in s.items():
best = None
dist = None
for (dd, b) in d.items():
km = haversine(a['latitude'], a['longitude'], b['latitude'], b['longitude'])
if dist is None or km < dist:
best = dd
dist = km
print(f'{ss} is nearest {best}: {dist} km')
Upvotes: 1