Reputation: 327
I have a tree represented by a data.frame
with two columns. The first column is the node id and the second one is the parent id, considering that the root is its own parent.
For instance,
df <- data.frame(id = 1:10, parent = c(1,1,1,2,2,2,3,3,5,5))
corresponds to the following tree:
I want to write a function
get_subtree <- function(new_root) {
...
}
which returns the vector of id's of nodes of the subtree whose root is new_root
. For example, get_subtree(2)
would return the vector c(2,4,5,6,9,10)
. I guess it should be a recursive function which returns new_root
if new_root
is a leaf. I can easily get the children of a given node, using df %>% filter(parent == new_root) %>% pull(id)
but it stops at the first generation. So how to write my function?
NB: I know about the package data.tree
. I could transform my data.frame
into a data.tree
object using mytree <- data.tree::FromDataFrameNetwork(df[-1,])
(here I need to remove the root) and then I guess I could easily use buit-in functions to get subtrees. The problem is: my tree has more than two million nodes and data.tree::FromDataFrameNetwork
just takes too long. I just need a relatively small subtree (a few thousands nodes) so I think it's better to work on the data.frame
directly.
Upvotes: 1
Views: 529
Reputation: 887691
Perhaps, we can use the igraph
library(igraph)
g1 <- graph.data.frame(df[2:1])
as.numeric(subcomponent(g1, 2, mode = 'out'))
#[1] 2 4 5 6 9 10
Upvotes: 1