waxattax
waxattax

Reputation: 357

Restructuring data for geographic proximity analyses in R

I have a data set of people's geographic coordinates, which looks like this:

Person  Latitude    Longitude
  1     46.0614     -23.9386
  2     48.1792      63.1136
  3     59.9289      66.3883
  4     42.8167      58.3167
  5     43.1167      63.25

I am planning on calculating geographic proximity at the dyadic level, using the geosphere package in R. In order to accomplish that, I need to create a data set that looks like this:

Person1 Person2 LatitudeP1  LongitudeP1 LatitudeP2  LongitudeP2
   1       2     46.0614    -23.9386     48.1792     63.1136
   1       3     46.0614    -23.9386     59.9289     66.3883
   1       4     46.0614    -23.9386     42.8167     58.3167
   1       5     46.0614    -23.9386     43.1167     63.25
   2       3     48.1792     63.1136     59.9289     66.3883
   2       4     48.1792     63.1136     42.8167     58.3167
   2       5     48.1792     63.1136     43.1167     63.25
   3       4     59.9289     66.3883     42.8167     58.3167
   3       5     59.9289     66.3883     43.1167     63.25
   4       5     42.8167     58.3167     43.1167     63.25

Thus, the resulting data has a row for each possible dyad in the data set, and includes the coordinates of both individuals in the dyad. "LatitudeP1" and "LongitudeP1" are the coordinates for "Person1" in the dyad, and "LatitudeP2" and "LongitudeP2" are the coordinates for "Person2" in the dyad. Also, it doesn't matter which ID is listed as Person1 versus Person2, since geographic distance is not a directed relationship.

Upvotes: 0

Views: 248

Answers (2)

jlhoward
jlhoward

Reputation: 59365

If you want the pairwise distances, and you are using package geosphere, why not use distm(...) instead of jumping through all these fiery hoops:

# df is the dataset from your question
library(geosphere)
distm(df[,3:2],fun=distHaversine)   # distance in *meters*
#         [,1]      [,2]    [,3]      [,4]      [,5]
# [1,]       0 6224407.2 5743824 6243068.1 6553157.4
# [2,] 6224407       0.0 1324950  704260.1  563654.6
# [3,] 5743824 1324949.8       0 1982326.1 1883584.1
# [4,] 6243068  704260.1 1982326       0.0  403183.0
# [5,] 6553157  563654.6 1883584  403183.0       0.0

You could also use the fossil package.

library(fossil)
earth.dist(df[,3:2],dist=FALSE)     # distance in *kilometers*
#          [,1]      [,2]     [,3]      [,4]      [,5]
# [1,]    0.000 6219.1967 5739.016 6237.8420 6547.6718
# [2,] 6219.197    0.0000 1323.841  703.6706  563.1828
# [3,] 5739.016 1323.8407    0.000 1980.6667 1882.0073
# [4,] 6237.842  703.6706 1980.667    0.0000  402.8455
# [5,] 6547.672  563.1828 1882.007  402.8455    0.0000

Note that these functions expect Longitude, then Latitude, so you have to pass cols 3:2, not 2:3.


EDIT Response to OP's comment.

"Edge list" sounds like you want to end up with an igraph object. You can use the distance matrix as an adjacency matrix in igraph, and the distances will populate the weights on an edge list automatically.

library(igraph)
library(geosphere)
g <- graph.adjacency(distm(df[,3:2],fun=distHaversine),
                     mode="undirected",weighted=TRUE)
set.seed(1)   # for reproducible plot
plot(g, layout=layout.fruchterman.reingold(g,weights=E(g)$weight))

get.data.frame(g,"edges")
#    from to    weight
# 1     1  2 6224407.2
# 2     1  3 5743824.5
# 3     1  4 6243068.1
# 4     1  5 6553157.4
# 5     2  3 1324949.8
# 6     2  4  704260.1
# 7     2  5  563654.6
# 8     3  4 1982326.1
# 9     3  5 1883584.1
# 10    4  5  403183.0

Upvotes: 1

rawr
rawr

Reputation: 20811

Just taking the possible combinations (combn) of Person 1 thru 5, and subsetting the Lat/long from your original data:

dat <- read.table(header = TRUE, text="Person  Latitude    Longitude
1     46.0614     -23.9386
2     48.1792      63.1136
3     59.9289      66.3883
4     42.8167      58.3167
5     43.1167      63.25")

tmp <- t(combn(nrow(dat),2))

#      [,1] [,2]
# [1,]    1    2
# [2,]    1    3
# [3,]    1    4
# [4,]    1    5
# [5,]    2    3
# [6,]    2    4
# [7,]    2    5
# [8,]    3    4
# [9,]    3    5
# [10,]    4    5

res <- cbind(tmp,
             do.call('cbind', lapply(1:2, function(x) 
               mapply(`[`, dat[, 2:3], MoreArgs = list(i=tmp[, x])))))
colnames(res) <- c('Person1','Person2','LatitudeP1','LongitudeP1',
                   'LatitudeP2','LongitudeP2')

data.frame(res)

#    Person1 Person2 LatitudeP1 LongitudeP1 LatitudeP2 LongitudeP2
# 1        1       2    46.0614    -23.9386    48.1792     63.1136
# 2        1       3    46.0614    -23.9386    59.9289     66.3883
# 3        1       4    46.0614    -23.9386    42.8167     58.3167
# 4        1       5    46.0614    -23.9386    43.1167     63.2500
# 5        2       3    48.1792     63.1136    59.9289     66.3883
# 6        2       4    48.1792     63.1136    42.8167     58.3167
# 7        2       5    48.1792     63.1136    43.1167     63.2500
# 8        3       4    59.9289     66.3883    42.8167     58.3167
# 9        3       5    59.9289     66.3883    43.1167     63.2500
# 10       4       5    42.8167     58.3167    43.1167     63.2500

Upvotes: 2

Related Questions