Reputation: 29
I have two datasets that I need to join together on a distance between two coordinates condition. I've created a function using Haversines formula to calculate distance_km but am running into performance constraints for how long it takes.
Dataset 1:
building_id | lat | lng
-------------|-------|--------
1 | 32.11 | -71.22
2 | 32.44 | -72.25
3 | 31.75 | -71.36```
Dataset 2:
building_id | lat | lng
------------|-------|--------
4 | 31.65 | -73.52
5 | 32.78 | -70.21
6 | 36.15 | -72.49
Each dataset has over 10,000 buildings in them and I would like to match dataset 2 to dataset 1 but only when the distance in km is less than 0.0075.
I currently am iterating through each row of dataset 1 and looking up all lat lng combos from dataset 2 to determine the min distance
dataset_2_latlng_dict = dict(zip(dataset_2.lng,dataset_2.lat))
for index, row in dataset_1.iterrows():
lat = row['lat']
lng = row['lng']
all_dist = []
final_list = []
for key, value in dataset_2_latlng_dict.iteritems():
distance = utils.distance_km(key,value,lng,lat)
all_dist.extend([distance])
final_list = sorted(all_dist, key=float)
dataset_1['min_distance'] = final_list[0]
Upvotes: 2
Views: 1068
Reputation: 49774
You didn't provide any data to, so I will leave this answer as descriptive only.
As you expected, there is no reason to calculate the distances to each of the other buildings. The 7.5 meter specification means that the lat and longs will be VERY close to matching directly for any buildings that are that near each other.
The distance between latitude lines varies between 110.6km at the equator to 111.7km at the poles. If we add some error margin and round to make this analysis easier, then we can use an estimate of 100km per degree. This means the 0.0075km maximum distance becomes 0.000075 degrees latitude maximum. So any building that will meet the 0.0075km distance standard will necessarily also meet the 0.000075 degrees of latitude standard. If you restrict running the calculations to buildings that are within 0.000075 degrees latitude you will only need to do the calculation for a much smaller subset of buildings.
Therefore you can simply sort the location lists by latitude, and then traverse the lists comparing the distances only for buildings whose latitude is withing 0.000075 degrees (7.5 meters) of a building on the other list.
Upvotes: 1