Arturo Sbr
Arturo Sbr

Reputation: 6333

While loop inside a for loop to calculate geospatial distance between 2 datasets in R

I have a data.table with 957 geocodes. I want to match it with another dataset with 317 geocodes. The matching condition is geospatial proximity. I want to match each observation from the first dataset to an observation from the second one such that the distance between both observations is 5000 meters or less.

My data looks like this:

> muni[1:3]
         mun Lat_Decimal Lon_Decimal
1:      1001    21.76672   -102.2818
2:      1002    22.16597   -102.0657
3:      1003    21.86138   -102.7248
> stations[1:3]
   station_number station_lat station_long
1:          10003      25.100     -106.567
2:          10018      24.944     -106.259
3:          10031      24.523     -105.952

I am using the distm function from library(geosphere) to calculate the distance.

I figured the way to attack this problem is a while loop. The idea is to take the first observation from muni and measure the distance to the first observation in stations. If the distance is 5000 meters or less, then assign the station_number of the first observation in station to the first observation in muni. If the distance is greater than 5000, then try the next observation in muni until the distance is 5000 meters or less.

Essentially, it's a loop that finds the first observation in stations that's 5000 meters or closer to an observation in muni.

This is a preliminary attempt at it:

for (i in 1:957) {
  j = 1
  while (distm(muni[i, .(Lon_Decimal, Lat_Decimal)],
               stations[j, .(station_long, station_lat)]) > 5000 & j <= 317) {
    muni[i, station_number := as.integer(stations[j, station_number])]
    muni[i, distance := distm(muni[i, .(Lon_Decimal, Lat_Decimal)],
                                   stations[j, .(station_long, station_lat)])]
    j = j + 1
}
}

I can tell this is not working because none of the rows in ´muni´ appear to have been overwritten after running this loop for (i in 1:3). I suppose there is an error in my loop that is ignoring the station_number := and distance := parts.

I would expect this loop to overwrite muni such that all the entire column had a station_number.

Upvotes: 1

Views: 175

Answers (1)

Fons MA
Fons MA

Reputation: 1282

I've read your few sample points as data.frames and converted them to sf below for the answer. If you're attached to geosphere, forgive the pun, everything should still apply the same, given that geosphere::distm also returns a matrix of distances.

First we get your data into sf format:


library(sf)

stations_raw <- "station_number station_lat station_long
1:          10003      25.100     -106.567
2:          10018      24.944     -106.259
3:          10031      24.523     -105.952"


mun_raw <- "mun Lat_Decimal Lon_Decimal
1:      1001    21.76672   -102.2818
2:      1002    22.16597   -102.0657
3:      1003    21.86138   -102.7248"

mun_df <- read.table(text = mun_raw)

stations_df <- read.table(text = stations_raw)

mun_sf <- st_as_sf(mun_df, coords = c("Lon_Decimal", "Lat_Decimal"), crs = 4326)
stations_sf <-  st_as_sf(stations_df, 
                          coords = c("station_long", "station_lat"), crs = 4326)

Then, we find the minimum for each interaction between dots:

closest <- list()

for(i in seq_len(nrow(mun_sf))){
  closest[[i]] <- stations_sf[which.min(
    st_distance(stations_sf, mun_sf[i,])),]
}

Finally, we extract the identifiers and attach them to the original df, removing the mun_id as you request:


mun_sf$closest_station <- purrr::map_chr(closest, "station_number")

mun_sf <- mun_sf[, c("closest_station", "geometry")]

mun_sf
#> Simple feature collection with 3 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -102.7248 ymin: 21.76672 xmax: -102.0657 ymax: 22.16597
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>    closest_station                   geometry
#> 1:           10031 POINT (-102.2818 21.76672)
#> 2:           10031 POINT (-102.0657 22.16597)
#> 3:           10031 POINT (-102.7248 21.86138)

The plot below helps visually check that, in this toy example, we've got the right answer.

ggplot() +
  geom_sf(data = mun_sf, colour = "red") +
  geom_sf_text(data = mun_sf, aes(label = mun), nudge_x = 0.25) +
  geom_sf(data = stations_sf, colour = "blue") +
  geom_sf_text(data = stations_sf, aes(label = station_number), nudge_x = -0.25)
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data

#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data

Upvotes: 1

Related Questions