Reputation: 881
I created 2 data frames with geom columns (of POINT type). Now I would like to calculate distance between each pair of points, e.g. point from 1st row in first df with point from 1st row in second df etc. Here are my data frames:
df1 <- table %>%
st_as_sf(coords = c("lonCust","latCust"), crs = 4326)
df2 <- table %>%
st_as_sf(coords = c("lonApp","latApp"), crs = 4326)
I used st_distance
:
distance <- st_distance(df1$geometry,df2$geometry)
but I got a matrix where distance is calculated for each-each pair from both geom columns:
[,1] [,2] [,3] [,4] [,5] ...
[1,] 139.7924 7735.5718 15225.02995 558.104089 1016.58121
[2,] 8503.0544 755.2915 8764.75396 7957.289600 8788.02800
[3,] 15306.5855 9336.9008 18.96914 14876.589918 15929.51643
[4,] 548.3045 7232.0164 14898.70637 8.094068 1078.38236
[5,] 911.5635 8084.3086 15993.36365 1127.730022 46.97799
.
.
I wanted distance to be calculated in one column, only between corresponding geom rows:
[,1]
[1,] 139.7924
[2,] 8503.0544
[3,] 15306.5855
[4,] 548.3045
[5,] 911.5635
.
.
I read about geosphere
package but sf
has very nice st_distance
function to measure distance, I wanted to use it. And most importantly, do I need first to join those data frames? Simple inner_join
from dplyr
doesn't allow to join two spatial data frames, st_join
on the other hand is not an option for me here bacause I don't want to join by geometries (geometries in two data frames are totally different)
Upvotes: 3
Views: 1486
Reputation: 4669
This method works by:
For this to work, you need an equidistant CRS. This means that, across an area of interest, any distance calculations are preserved.
Let's say that you were interested in calculating distances across the USA, you could use EPSG:102005
. See this GIS answer for mode details. The choice of CRS here is crucial, so make sure you get this right, else the answer will be nonsense.
crs.source = 4326
crs.dest = st_crs("+proj=eqdc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")
# coords1 and coords2 are matrixes with columns X and Y and rows of points in the `crs.dest` coordinate system.
coords1 <- table %>%
st_as_sf(coords = c("lonCust","latCust"), crs = crs.source) %>%
st_transform(crs.dest) %>%
st_coordinates()
coords2 <- table %>%
st_as_sf(coords = c("lonApp","latApp"), crs = crs.source) %>%
st_transform(crs.dest) %>%
st_coordinates()
# This is a vectorised computation, and so should be instant for a mere 25,000 rows :-)
table$distances = local({
x_diff = coords1[, 'X'] - coords2[, 'X']
y_diff = coords1[, 'Y'] - coords2[, 'Y']
return(sqrt(x^2 + y^2))
})
Upvotes: 2
Reputation: 416
As @mrhellmann mentioned, you could just add by_element=T
and that should work. If speed is still an issue, I recommend using the DistGeo()
from the geosphere
package. But be sure to look at the documentation to see that your data is appropriate for this function.
library(geosphere)
library(tidyverse)
library(sf)
df1 <- table %>%
st_as_sf(coords = c("lonCust","latCust"), crs = 4326)
doParallel::registerDoParallel()
df_crs4326 <- df1 %>%
group_by(your_id_here) %>%
mutate(
lonCust = map(geometry, 2) %>% unlist(),
latCust= map(geometry, 1) %>% unlist(),
# geometry_2 = st_as_sfc(coords = c("lonApp","latApp"), crs = 4326)
) %>%
mutate(
distance_to_next = distGeo(c(lonCust, latCust), c(lonApp, latApp)) %>% set_units(m),
# distance_2 = st_distance(geometry, geometry_2, by_element = TRUE)
) %>%
ungroup()
Note that I am not sure the commented out parts work without testing on reproducible data.
Upvotes: 3