Reputation: 6333
I have a data.table
with 957 geocodes. I want to match it with another dataset with 317 geocodes. The matching condition is geospatial proximity. I want to match each observation from the first dataset to an observation from the second one such that the distance between both observations is 5000 meters or less.
My data looks like this:
> muni[1:3]
mun Lat_Decimal Lon_Decimal
1: 1001 21.76672 -102.2818
2: 1002 22.16597 -102.0657
3: 1003 21.86138 -102.7248
> stations[1:3]
station_number station_lat station_long
1: 10003 25.100 -106.567
2: 10018 24.944 -106.259
3: 10031 24.523 -105.952
I am using the distm
function from library(geosphere)
to calculate the distance.
I figured the way to attack this problem is a while
loop. The idea is to take the first observation from muni
and measure the distance to the first observation in stations
. If the distance is 5000 meters or less, then assign the station_number
of the first observation in station
to the first observation in muni
. If the distance is greater than 5000, then try the next observation in muni
until the distance is 5000 meters or less.
Essentially, it's a loop that finds the first observation in stations
that's 5000 meters or closer to an observation in muni
.
This is a preliminary attempt at it:
for (i in 1:957) {
j = 1
while (distm(muni[i, .(Lon_Decimal, Lat_Decimal)],
stations[j, .(station_long, station_lat)]) > 5000 & j <= 317) {
muni[i, station_number := as.integer(stations[j, station_number])]
muni[i, distance := distm(muni[i, .(Lon_Decimal, Lat_Decimal)],
stations[j, .(station_long, station_lat)])]
j = j + 1
}
}
I can tell this is not working because none of the rows in ´muni´ appear to have been overwritten after running this loop for (i in 1:3)
. I suppose there is an error in my loop that is ignoring the station_number :=
and distance :=
parts.
I would expect this loop to overwrite muni
such that all the entire column had a station_number
.
Upvotes: 1
Views: 175
Reputation: 1282
I've read your few sample points as data.frames
and converted them to sf
below for the answer. If you're attached to geosphere
, forgive the pun, everything should still apply the same, given that geosphere::distm
also returns a matrix of distances.
First we get your data into sf
format:
library(sf)
stations_raw <- "station_number station_lat station_long
1: 10003 25.100 -106.567
2: 10018 24.944 -106.259
3: 10031 24.523 -105.952"
mun_raw <- "mun Lat_Decimal Lon_Decimal
1: 1001 21.76672 -102.2818
2: 1002 22.16597 -102.0657
3: 1003 21.86138 -102.7248"
mun_df <- read.table(text = mun_raw)
stations_df <- read.table(text = stations_raw)
mun_sf <- st_as_sf(mun_df, coords = c("Lon_Decimal", "Lat_Decimal"), crs = 4326)
stations_sf <- st_as_sf(stations_df,
coords = c("station_long", "station_lat"), crs = 4326)
Then, we find the minimum for each interaction between dots:
closest <- list()
for(i in seq_len(nrow(mun_sf))){
closest[[i]] <- stations_sf[which.min(
st_distance(stations_sf, mun_sf[i,])),]
}
Finally, we extract the identifiers and attach them to the original df, removing the mun_id as you request:
mun_sf$closest_station <- purrr::map_chr(closest, "station_number")
mun_sf <- mun_sf[, c("closest_station", "geometry")]
mun_sf
#> Simple feature collection with 3 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -102.7248 ymin: 21.76672 xmax: -102.0657 ymax: 22.16597
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> closest_station geometry
#> 1: 10031 POINT (-102.2818 21.76672)
#> 2: 10031 POINT (-102.0657 22.16597)
#> 3: 10031 POINT (-102.7248 21.86138)
The plot below helps visually check that, in this toy example, we've got the right answer.
ggplot() +
geom_sf(data = mun_sf, colour = "red") +
geom_sf_text(data = mun_sf, aes(label = mun), nudge_x = 0.25) +
geom_sf(data = stations_sf, colour = "blue") +
geom_sf_text(data = stations_sf, aes(label = station_number), nudge_x = -0.25)
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data
Upvotes: 1