Reputation: 41
I am trying to make a tree in R and calculate the distance between 2 nodes.
The data frame to make the tree is like:
tree.source <- data.frame(ID = 1:10, parentID = c(NA,1,1,1,2,2,2,3,4,4))
#ID parentID
#1 NA
#2 1
#3 1
#4 1
#5 2
#6 2
#7 2
#8 3
#9 4
#10 4
And hope to create a tree structure like this
In addition, I also want to get the distance between 2 nodes. For example, the distance between Node 5 and 10 here is 4, through 5-2-1-4-10. There are 4 edges to link them. And the distance between Node 2 and 8 is 3, through 2-1-3-8.
The tree can be build by using data.tree
package with paths for each node, for example, the PathString
for Node 10 should be given as 1/4/10, but the PathString
can be very long when the number of levels increase. Is there a better way to build the tree?
Upvotes: 1
Views: 804
Reputation: 23129
Try this (with your given tree.source):
library(igraph)
g <- graph.data.frame(tree.source[-1,2:1], directed = FALSE)
plot(g)
# do a bfs with root as source, you will get distance of each vertex from root
bfs(g, root=1, "out", dist=TRUE)$dist
# 1 2 3 4 5 6 7 8 9 10
# 0 1 1 1 2 2 2 2 2 2
# shortest path from 5 to 10
sp <- unlist(shortest_paths(g, 5, 10, mode="out")$vpath)
sp
# 5 2 1 4 10
# 5 2 1 4 10
# distance from 5 to 10 = # vertices on the path - 1
length(sp)-1
# [1] 4
# shortest paths from source node 5 to all
sp_from_5 <- shortest_paths(g, 5, mode="out")$vpath
names(sp_from_5) <- names(V(g))
sp_from_5
# output
$`1`
+ 3/10 vertices, named:
[1] 5 2 1
$`2`
+ 2/10 vertices, named:
[1] 5 2
$`3`
+ 4/10 vertices, named:
[1] 5 2 1 3
$`4`
+ 4/10 vertices, named:
[1] 5 2 1 4
$`5`
+ 1/10 vertex, named:
[1] 5
$`6`
+ 3/10 vertices, named:
[1] 5 2 6
$`7`
+ 3/10 vertices, named:
[1] 5 2 7
$`8`
+ 5/10 vertices, named:
[1] 5 2 1 3 8
$`9`
+ 5/10 vertices, named:
[1] 5 2 1 4 9
$`10`
+ 5/10 vertices, named:
[1] 5 2 1 4 10
Upvotes: 2
Reputation: 41
The tree can be generated by using:
tree <- as.Node(tree.source[-1,],mode = "network")
as.Node function can generate a tree with a network, which has the first column as "from" and the second as "to" and the following column as the attributes.
And distance(g, 2, 8)
can give the distance between Node 2 and 8.
Upvotes: 3