EmilA
EmilA

Reputation: 167

Convert distance list to distance matrix

In R, I have a data.frame showing the distance between pairs of nodes:

dl <- data.frame(
  a = c('a','a','a','b','b','c'),
  b = c('b','c','d','c','d','d'),
  dist = c(1,2,3,2,1,2)
)

I want to convert this to a distance matrix, with the diagonal set to zero and the upper triangle set to NA, since the distances are symmetrical:

dm <- data.frame(
  a = c(0,2,3,2),
  b = c(NA, 0, 2, 1),
  c = c(NA, NA, 0, 2),
  d = c(NA, NA, NA, 0),
  row.names = c('a','b','c','d')
) %>% as.matrix()

My real data is very large, so computational efficiency is key. The only solution I can come up with myself involves either looping or using igraph to first convert the list to a graph, and then converting that graph to matrix, and thats not really ideal given the size of my data. The input is a data.frame since node-ids are text while distances are numeric, and the desired output is a matrix since speed is key.

Upvotes: 3

Views: 218

Answers (2)

jblood94
jblood94

Reputation: 16981

Building a sparse matrix will be fast.

library(Matrix)
u <- unique(unlist(dl[,1:2]))
dm <- sparseMatrix(match(dl$b, u), match(dl$a, u), x = dl$dist, repr = "T",
                   dims = rep(length(u), 2), dimnames = list(u, u))

dm
#> 4 x 4 sparse Matrix of class "dgTMatrix"
#>   a b c d
#> a . . . .
#> b 1 . . .
#> c 2 2 . .
#> d 3 1 2 .

If dist objects will work, you can build one directly. This will also be very fast. (This assumes dl is sorted by a then b.)

dm <- dl$dist
class(dm) <- "dist"
attr(dm, "Labels") <- unique(unlist(dl[,1:2]))
attr(dm, "Size") <- length(attr(dm, "Labels"))
attr(dm, "Diag") <- FALSE
attr(dm, "Upper") <- FALSE
attr(dm, "method") <- "euclidean"

dm
#>   a b c
#> b 1    
#> c 2 2  
#> d 3 1 2

Upvotes: 4

ThomasIsCoding
ThomasIsCoding

Reputation: 101753

Here are some base R options

Use xtabs

v <- unique(unlist(dl[-3]))
dl[-3] <- lapply(dl[-3], factor, levels = v)
dm <- unclass(t(xtabs(dist ~ ., dl)))
dm[upper.tri(dm)] <- NA

which gives

> dm
   a
b   a  b  c  d
  a 0 NA NA NA
  b 1  0 NA NA
  c 2  2  0 NA
  d 3  1  2  0
attr(,"call")
xtabs(formula = dist ~ ., data = dl)

Use as.dist

v <- unique(unlist(dl[-3]))
dl[-3] <- lapply(dl[-3], factor, levels = v)
dm <- as.dist(t(xtabs(dist ~ ., dl)), diag = TRUE)

which gives

> dm
  a b c d
a 0
b 1 0
c 2 2 0  
d 3 1 2 0

Use matrix

v <- unique(unlist(dl[-3]))
dm <- `diag<-`(matrix(NA, length(v), length(v), dimnames = list(v, v)), 0)
dm[as.matrix(rev(dl[-3]))] <- dl$dist

which gives

> dm
  a  b  c  d
a 0 NA NA NA
b 1  0 NA NA
c 2  2  0 NA
d 3  1  2  0

Upvotes: 5

Related Questions