Reputation: 40499
I have a matrix m
m <- matrix (
c( 2, 1, 8, 5,
7, 6, 3, 4,
9, 3, 2, 8,
1, 3, 7, 4),
nrow = 4,
ncol = 4,
byrow = TRUE)
rownames(m) <- c('A', 'B', 'C', 'D')
Now, I'd like to order the rows of m
based on their respective distance, so I use dist()
dist_m <- dist(m)
dist_m
is, when printed
A B C
B 8.717798
C 9.899495 5.477226
D 2.645751 7.810250 10.246951
Since I want it ordered, I try sort(dist_m)
which prints
[1] 2.645751 5.477226 7.810250 8.717798 9.899495 10.246951
Which is almost what I want. But I'd be more happy if it also printed the names of the two rows of which a number is the distance, something like
2.645751 A D
5.477226 B C
7.810250 B D
8.717798 A B
9.899495 A C
10.246951 C D
This is certainly possible, but I have no idea how I could achieve this.
Upvotes: 4
Views: 1861
Reputation: 11
If you do have distance values = 0 in your dist object
I started using the solution posted by akrun to sort the output of a dist object, but in my case, I do have distance values = 0. To avoid discarding these with the subset
step, I first converted the upper triangle to NA, an then the diagonal to NA as well, using diag
(actually obtained a symmetric matrix from another program). Finally, instead of subset
, I used melt
, na.omit
, and order
:
library(reshape2)
#create matrix
m <- matrix (
c( 2, 1, 8, 5,
2, 1, 8, 5,
9, 3, 2, 8,
1, 3, 7, 4),
nrow = 4,
ncol = 4,
byrow = TRUE)
rownames(m) <- c('A', 'B', 'C', 'D')
# use dist
dist_m <- dist(m)
dist_m
# A and B are identical
A B C
B 0.000000
C 9.899495 9.899495
D 2.645751 2.645751 10.246951
m1 <- as.matrix(dist_m)
m1[upper.tri(m1)] <- NA
diag(m1) <- NA
m2 <- melt(m1)
na.omit(m2[order(m2$value),3:1])
As a result, the pairwise distance value between A and B is preserved:
value Var2 Var1
2 0.000000 A B
4 2.645751 A D
8 2.645751 B D
3 9.899495 A C
7 9.899495 B C
12 10.246951 C D
Upvotes: 1
Reputation: 2897
Using base R:
dm <- as.matrix(dist_m)
df <- data.frame(data = c(dm),
column = c(col(dm)),
row = c(row(dm)))
# get only one triangle
df <- df[df$row > df$column, ]
# put in order
df[order(df$data), ]
# for letters, add this
df$row <- LETTERS[df$row]
df$column <- LETTERS[df$column]
Upvotes: 0
Reputation: 887088
One option would be to convert the dist
to matrix
, replace the upper triangle values as 0, melt
, subset
the non-zero values, and then order
based on the 'value' column.
m1 <- as.matrix(dist_m)
m1[upper.tri(m1)] <- 0
library(reshape2)
m2 <- subset(melt(m1), value!=0)
m2[order(m2$value),3:1]
# value Var2 Var1
#4 2.645751 A D
#7 5.477226 B C
#8 7.810250 B D
#2 8.717798 A B
#3 9.899495 A C
#12 10.246951 C D
Or a base R
option suggested by @David Arenburg after getting the 'm1'
m2 <- cbind(which(m1!=0, arr.ind=TRUE), value= m1[m1!=0])
m2[order(m2[,'value']),]
Upvotes: 4