amwalker
amwalker

Reputation: 355

How to get distance from long/lat for multiple samples in a dataset in r

I am trying to calculate and save an output file that gives all the distances from long/lat coordinates associated to multiple samples in R.

Example of data:

Sample     Latitude     Longitude
A          70           141
B          72           142
C          71           143
D          69           141

I am currently using the geosphere package in r, and specifically the distVincentyEllipsoid function. You can use it like this:

distVincentyEllipsoid(p1 = c(141,70), p2 = c(142,72)) 

But this only gives you one distance between tWo samples at time and I need to get distances between all samples, among 15 samples, and write them to an output file listing samples and associated distances.

Example output:

Samples     Distance(m)
A-B             8
A-C             26
B-C             13
A-D             20

Thanks.

Upvotes: 1

Views: 561

Answers (3)

Shape
Shape

Reputation: 2952

So what you want is each combination of the two locations, and then the associated positions,

you can do this with joins and the data.table package

library(data.table)
library(geosphere)
testdata <- data.table(Sample = LETTERS[1:4],
                   Latitude = c(70,72,71,69),
                   Longitude = c(141,142,143,141))

# Create each pair of combinations with combn
combTable <- rbindlist(combn(testdata$Sample,2,simplify = FALSE,FUN = as.list))

# Join on the first column
setkey(testdata,Sample)
setkey(combTable,V1)

combTable <- testdata[combTable]

#Join on the second column
setkey(combTable,V2)

combTable <- testdata[combTable]

# Mapply to fit the function's requirements of two vectors for each call
combTable[,.(dist = mapply(function(Lat1, Lon1, Lat2, Lon2) 
                          distVincentyEllipsoid(c(Lon1, Lat1), c(Lon2, Lat2)),
                          Latitude,
                          Longitude,
                          i.Latitude,
                          i.Longitude,
                          SIMPLIFY =FALSE ),
         Sample,
         i.Sample)]

EDIT: doing this in one step without storing intermediate variables, and per @Arun's comment (And using magrittr syntax):

 library(magrittr)
 combTable <- 
   testdata[combTable, on = c('Sample' = 'V1')] %>% 
   testdata[., on = c(`Sample` = 'V2')] %>%
   .[,.(dist = mapply(function(Lat1, Lon1, Lat2, Lon2) 
                      distVincentyEllipsoid(c(Lon1, Lat1),c(Lon2, Lat2)),
                      Latitude,
                      Longitude,
                      i.Latitude,
                      i.Longitude,
                      SIMPLIFY = FALSE),
      Sample,
      i.Sample)]

Upvotes: 3

akuiper
akuiper

Reputation: 214957

Here is another solution with the outer function.

library(geosphere)
myList <- setNames(split(df[,c(3,2)], seq_len(nrow(df))), df$Sample)
distMat <- outer(myList, myList, Vectorize(distVincentyEllipsoid))

This gives a distance matrix whose distance is defined by distVincentyEllipsoid. And the result as follows:

> distMat
         A        B        C        D
A      0.0 226082.2 134163.1 111555.6
B 226082.2      0.0 117066.1 336761.1
C 134163.1 117066.1      0.0 235802.0
D 111555.6 336761.1 235802.0      0.0

Convert it to the format you want.

library(tidyr); library(dplyr)
distMat[lower.tri(distMat)] <- 0
distDf <- data.frame(distMat)
distDf$P1 <- row.names(distDf)
gather(distDf, P2, Distance, -P1) %>% filter(Distance != 0) %>% 
      mutate(Sample = paste(P1, P2, sep = "-")) %>% select(Sample, Distance)
  Sample Distance
1    A-B 226082.2
2    A-C 134163.1
3    B-C 117066.1
4    A-D 111555.6
5    B-D 336761.1
6    C-D 235802.0

Note: don't have time to compare the efficiency, but since this solution avoids the high level sampling data from the original data frame. It should be relatively fast.

Upvotes: 5

Kunal Puri
Kunal Puri

Reputation: 3427

You can do this in this way:

sample_names <- data$Sample

nrow_data <- nrow(data)

test <- function(x){
    return (list(Sample = paste(sample_names[x[1]],sample_names[x[2]],sep='-'),
        Distance.m = distVincentyEllipsoid(p1 = data[x[1],3:2], p2 = data[x[2],3:2])))
}

ans <- combn(1:nrow_data,2,test)

ans_df <- data.frame(Sample = unlist(ans[1,]),Distance.m = unlist(ans[2,]))

##  Sample Distance.m
##1    A-B   226082.2
##2    A-C   134163.1
##3    A-D   111555.6
##4    B-C   117066.1
##5    B-D   336761.1
##6    C-D   235802.0

Upvotes: 4

Related Questions