Finding the lat-lon pairs with minimum Euclidean distance between two columns

Question

I am trying to find the most efficient way to solve the following puzzle in R without having to use nested for loops (nested for loops would take forever):

Let's say we have 2 data frames d_zone2 and stops. Among their columns are the three ones: lat, long and zone, which describe the positions of certain points on the map divided into different polygons. The zone column in d_zone2 is all initialized as NA. Now, I want to assign the correct value to each element in the zone column in d_zone2 by the rule: for each pair of lat-lon in d_zone2, I assign the zone element in stops if the pair of lat-lon corresponds to that zone element in stops has the minimum Euclidean distance to the pair of lat-lon in d_zone2.

The initial solution I thought of is arranging the data frame stops in an increasing order based on lat, and then long. Then for each pair of lat-lon in d_zone2, I can use nested for loops to go through all the successive pairs oflat-lon to determine where my lat-lon in d_zone2 is. The code is the following:

for(i in 1:nrow(d_zone2)){
   for(j in 1:nrow(stops)){
     if(d_zone2$Lat[i] >= stops$Lat[j] && d_zone2$Long[i] >= stops$Lat[j] && d_zone2$Lat[i]<= stops$Lat[j+1] && d_zone2$Long[i] <= stops$Lat[j+1]){
           d_zone2$X8[i] = stops$X8[j];
    }  
  }
}

However, I realized that this is not quite right, because d_zone2$X8[i] might belong to stops$X8[j+1] (since its lat-lon might be closer to that of stops$X8[j+1]compared to stops$X8[j]). Thus, I think the only valid approach is to find which pair of lat-lon in stops that gives the minimum Euclidean distance to a pair of lat-lon in d_zone2. But I don't know how to do this in R without using nested for() loops.

2nd approach: Another approach is to take advantage of the list of polygons created stored in the zone.csv file below. Now, the solution would be to select out the bucket where a pair of lat-lon in d_zone2 and stops fall into, then just assign the zone number assigned in stops$X8 for that particular lat-lon pair to the element in d_zone2.

Question: Could anyone please help me solve this puzzle using either the Euclidean or 2nd approach demonstrated above? I want to use dplyr::select(dplyr::left_join(x = d_zone2, y = stops%>% select("Lat", "Long", X8), by = ...), but I am not sure how to fill in the true condition for by=??

.RData file containing both data frames d_zone2 and stops. Warning: Quite large files!

Geo-polygon coordinates

Martin Schmelzer · Accepted Answer

This takes about 15 seconds because we filter out duplicate points first. This leaves us with 5457 unique points in d_zone2. For each of them we compute the distance to all stops and get the index of the stop with the minimum distance. Afterwards you can match the zones by stop id to all 19228939 points.

library(sp)
library(data.table)

setDT(d_zone2)

stop_points <- as.matrix(stops[, 3:2])
short <- unique(d_zone2, by = c("Long", "Lat"))
short[, ZONE := stops[which.min(spDists(x = stop_points, y = cbind(Long, Lat))),]$X8, by=.(Long, Lat)]

Finding the lat-lon pairs with minimum Euclidean distance between two columns

Answers (1)

Related Questions