Reputation: 355
I am trying to calculate and save an output file that gives all the distances from long/lat coordinates associated to multiple samples in R.
Example of data:
Sample Latitude Longitude
A 70 141
B 72 142
C 71 143
D 69 141
I am currently using the geosphere package in r, and specifically the distVincentyEllipsoid function. You can use it like this:
distVincentyEllipsoid(p1 = c(141,70), p2 = c(142,72))
But this only gives you one distance between tWo samples at time and I need to get distances between all samples, among 15 samples, and write them to an output file listing samples and associated distances.
Example output:
Samples Distance(m)
A-B 8
A-C 26
B-C 13
A-D 20
Thanks.
Upvotes: 1
Views: 561
Reputation: 2952
So what you want is each combination of the two locations, and then the associated positions,
you can do this with joins and the data.table
package
library(data.table)
library(geosphere)
testdata <- data.table(Sample = LETTERS[1:4],
Latitude = c(70,72,71,69),
Longitude = c(141,142,143,141))
# Create each pair of combinations with combn
combTable <- rbindlist(combn(testdata$Sample,2,simplify = FALSE,FUN = as.list))
# Join on the first column
setkey(testdata,Sample)
setkey(combTable,V1)
combTable <- testdata[combTable]
#Join on the second column
setkey(combTable,V2)
combTable <- testdata[combTable]
# Mapply to fit the function's requirements of two vectors for each call
combTable[,.(dist = mapply(function(Lat1, Lon1, Lat2, Lon2)
distVincentyEllipsoid(c(Lon1, Lat1), c(Lon2, Lat2)),
Latitude,
Longitude,
i.Latitude,
i.Longitude,
SIMPLIFY =FALSE ),
Sample,
i.Sample)]
EDIT: doing this in one step without storing intermediate variables, and per @Arun's comment (And using magrittr syntax):
library(magrittr)
combTable <-
testdata[combTable, on = c('Sample' = 'V1')] %>%
testdata[., on = c(`Sample` = 'V2')] %>%
.[,.(dist = mapply(function(Lat1, Lon1, Lat2, Lon2)
distVincentyEllipsoid(c(Lon1, Lat1),c(Lon2, Lat2)),
Latitude,
Longitude,
i.Latitude,
i.Longitude,
SIMPLIFY = FALSE),
Sample,
i.Sample)]
Upvotes: 3
Reputation: 214957
Here is another solution with the outer
function.
library(geosphere)
myList <- setNames(split(df[,c(3,2)], seq_len(nrow(df))), df$Sample)
distMat <- outer(myList, myList, Vectorize(distVincentyEllipsoid))
This gives a distance matrix whose distance is defined by distVincentyEllipsoid
. And the result as follows:
> distMat
A B C D
A 0.0 226082.2 134163.1 111555.6
B 226082.2 0.0 117066.1 336761.1
C 134163.1 117066.1 0.0 235802.0
D 111555.6 336761.1 235802.0 0.0
Convert it to the format you want.
library(tidyr); library(dplyr)
distMat[lower.tri(distMat)] <- 0
distDf <- data.frame(distMat)
distDf$P1 <- row.names(distDf)
gather(distDf, P2, Distance, -P1) %>% filter(Distance != 0) %>%
mutate(Sample = paste(P1, P2, sep = "-")) %>% select(Sample, Distance)
Sample Distance
1 A-B 226082.2
2 A-C 134163.1
3 B-C 117066.1
4 A-D 111555.6
5 B-D 336761.1
6 C-D 235802.0
Note: don't have time to compare the efficiency, but since this solution avoids the high level sampling data from the original data frame. It should be relatively fast.
Upvotes: 5
Reputation: 3427
You can do this in this way:
sample_names <- data$Sample
nrow_data <- nrow(data)
test <- function(x){
return (list(Sample = paste(sample_names[x[1]],sample_names[x[2]],sep='-'),
Distance.m = distVincentyEllipsoid(p1 = data[x[1],3:2], p2 = data[x[2],3:2])))
}
ans <- combn(1:nrow_data,2,test)
ans_df <- data.frame(Sample = unlist(ans[1,]),Distance.m = unlist(ans[2,]))
## Sample Distance.m
##1 A-B 226082.2
##2 A-C 134163.1
##3 A-D 111555.6
##4 B-C 117066.1
##5 B-D 336761.1
##6 C-D 235802.0
Upvotes: 4