mustafa00
mustafa00

Reputation: 881

How to measure distance between points in separate data frames?

I created 2 data frames with geom columns (of POINT type). Now I would like to calculate distance between each pair of points, e.g. point from 1st row in first df with point from 1st row in second df etc. Here are my data frames:

df1 <- table %>%
  st_as_sf(coords = c("lonCust","latCust"), crs = 4326)

df2 <- table %>%
  st_as_sf(coords = c("lonApp","latApp"), crs = 4326)

I used st_distance:

distance <- st_distance(df1$geometry,df2$geometry)

but I got a matrix where distance is calculated for each-each pair from both geom columns:

           [,1]      [,2]        [,3]         [,4]        [,5]  ...
[1,]   139.7924 7735.5718 15225.02995   558.104089  1016.58121
[2,]  8503.0544  755.2915  8764.75396  7957.289600  8788.02800
[3,] 15306.5855 9336.9008    18.96914 14876.589918 15929.51643
[4,]   548.3045 7232.0164 14898.70637     8.094068  1078.38236
[5,]   911.5635 8084.3086 15993.36365  1127.730022    46.97799
.
.

I wanted distance to be calculated in one column, only between corresponding geom rows:

           [,1]     
[1,]   139.7924 
[2,]  8503.0544
[3,] 15306.5855 
[4,]   548.3045
[5,]   911.5635
.
.

I read about geosphere package but sf has very nice st_distance function to measure distance, I wanted to use it. And most importantly, do I need first to join those data frames? Simple inner_join from dplyr doesn't allow to join two spatial data frames, st_join on the other hand is not an option for me here bacause I don't want to join by geometries (geometries in two data frames are totally different)

Upvotes: 3

Views: 1486

Answers (2)

iamyojimbo
iamyojimbo

Reputation: 4669

Super Fast Vectorised Computation

This method works by:

  1. Projecting the (longitude, latitude) coordinates to a relevant coordinate system that is equidistant for your region of interest. (An equidistant coordinate system preserves distance measurements between points, so you can just use basic geometry to calculate distances).
  2. Convert the geometries to a Base R metrix with X and Y columns.
  3. Finally, simply use Pythagoras's theorem to calculate the distance between pairs of points.

Get the Coordinate Reference System (CRS) right first

For this to work, you need an equidistant CRS. This means that, across an area of interest, any distance calculations are preserved.

Let's say that you were interested in calculating distances across the USA, you could use EPSG:102005. See this GIS answer for mode details. The choice of CRS here is crucial, so make sure you get this right, else the answer will be nonsense.

Applied to your example

crs.source = 4326
crs.dest = st_crs("+proj=eqdc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")

# coords1 and coords2 are matrixes with columns X and Y and rows of points in the `crs.dest` coordinate system.
coords1 <- table %>%
  st_as_sf(coords = c("lonCust","latCust"), crs = crs.source) %>%
  st_transform(crs.dest) %>%
  st_coordinates()
  
coords2 <- table %>%
  st_as_sf(coords = c("lonApp","latApp"), crs = crs.source) %>%
  st_transform(crs.dest) %>%
  st_coordinates()

# This is a vectorised computation, and so should be instant for a mere 25,000 rows :-)
table$distances = local({
  x_diff = coords1[, 'X'] - coords2[, 'X']
  y_diff = coords1[, 'Y'] - coords2[, 'Y']
  return(sqrt(x^2 + y^2))
})

Upvotes: 2

k6adams
k6adams

Reputation: 416

As @mrhellmann mentioned, you could just add by_element=T and that should work. If speed is still an issue, I recommend using the DistGeo() from the geosphere package. But be sure to look at the documentation to see that your data is appropriate for this function.

library(geosphere)
library(tidyverse)
library(sf)

df1 <- table %>%
  st_as_sf(coords = c("lonCust","latCust"), crs = 4326)

doParallel::registerDoParallel()
df_crs4326 <- df1 %>%
  group_by(your_id_here) %>% 
  mutate(
    lonCust = map(geometry, 2) %>% unlist(),
    latCust= map(geometry, 1) %>% unlist(),
    # geometry_2 = st_as_sfc(coords = c("lonApp","latApp"), crs = 4326)
    ) %>%
  mutate(
    distance_to_next = distGeo(c(lonCust, latCust), c(lonApp, latApp)) %>% set_units(m),
    # distance_2 = st_distance(geometry, geometry_2, by_element = TRUE)
    ) %>%
    ungroup()

Note that I am not sure the commented out parts work without testing on reproducible data.

Upvotes: 3

Related Questions