Reputation: 167
In R, I have a data.frame showing the distance between pairs of nodes:
dl <- data.frame(
a = c('a','a','a','b','b','c'),
b = c('b','c','d','c','d','d'),
dist = c(1,2,3,2,1,2)
)
I want to convert this to a distance matrix, with the diagonal set to zero and the upper triangle set to NA, since the distances are symmetrical:
dm <- data.frame(
a = c(0,2,3,2),
b = c(NA, 0, 2, 1),
c = c(NA, NA, 0, 2),
d = c(NA, NA, NA, 0),
row.names = c('a','b','c','d')
) %>% as.matrix()
My real data is very large, so computational efficiency is key. The only solution I can come up with myself involves either looping or using igraph
to first convert the list to a graph, and then converting that graph to matrix, and thats not really ideal given the size of my data. The input is a data.frame since node-ids are text while distances are numeric, and the desired output is a matrix since speed is key.
Upvotes: 3
Views: 218
Reputation: 16981
Building a sparse matrix will be fast.
library(Matrix)
u <- unique(unlist(dl[,1:2]))
dm <- sparseMatrix(match(dl$b, u), match(dl$a, u), x = dl$dist, repr = "T",
dims = rep(length(u), 2), dimnames = list(u, u))
dm
#> 4 x 4 sparse Matrix of class "dgTMatrix"
#> a b c d
#> a . . . .
#> b 1 . . .
#> c 2 2 . .
#> d 3 1 2 .
If dist
objects will work, you can build one directly. This will also be very fast. (This assumes dl
is sorted by a
then b
.)
dm <- dl$dist
class(dm) <- "dist"
attr(dm, "Labels") <- unique(unlist(dl[,1:2]))
attr(dm, "Size") <- length(attr(dm, "Labels"))
attr(dm, "Diag") <- FALSE
attr(dm, "Upper") <- FALSE
attr(dm, "method") <- "euclidean"
dm
#> a b c
#> b 1
#> c 2 2
#> d 3 1 2
Upvotes: 4
Reputation: 101753
Here are some base R options
xtabs
v <- unique(unlist(dl[-3]))
dl[-3] <- lapply(dl[-3], factor, levels = v)
dm <- unclass(t(xtabs(dist ~ ., dl)))
dm[upper.tri(dm)] <- NA
which gives
> dm
a
b a b c d
a 0 NA NA NA
b 1 0 NA NA
c 2 2 0 NA
d 3 1 2 0
attr(,"call")
xtabs(formula = dist ~ ., data = dl)
as.dist
v <- unique(unlist(dl[-3]))
dl[-3] <- lapply(dl[-3], factor, levels = v)
dm <- as.dist(t(xtabs(dist ~ ., dl)), diag = TRUE)
which gives
> dm
a b c d
a 0
b 1 0
c 2 2 0
d 3 1 2 0
matrix
v <- unique(unlist(dl[-3]))
dm <- `diag<-`(matrix(NA, length(v), length(v), dimnames = list(v, v)), 0)
dm[as.matrix(rev(dl[-3]))] <- dl$dist
which gives
> dm
a b c d
a 0 NA NA NA
b 1 0 NA NA
c 2 2 0 NA
d 3 1 2 0
Upvotes: 5