Augustin
Augustin

Reputation: 327

Extract subtree in R

I have a tree represented by a data.frame with two columns. The first column is the node id and the second one is the parent id, considering that the root is its own parent.

For instance,

df <- data.frame(id = 1:10, parent = c(1,1,1,2,2,2,3,3,5,5))

corresponds to the following tree:enter image description here

I want to write a function

get_subtree <- function(new_root) {
  ...
}

which returns the vector of id's of nodes of the subtree whose root is new_root. For example, get_subtree(2) would return the vector c(2,4,5,6,9,10). I guess it should be a recursive function which returns new_root if new_root is a leaf. I can easily get the children of a given node, using df %>% filter(parent == new_root) %>% pull(id) but it stops at the first generation. So how to write my function?

NB: I know about the package data.tree. I could transform my data.frame into a data.tree object using mytree <- data.tree::FromDataFrameNetwork(df[-1,]) (here I need to remove the root) and then I guess I could easily use buit-in functions to get subtrees. The problem is: my tree has more than two million nodes and data.tree::FromDataFrameNetworkjust takes too long. I just need a relatively small subtree (a few thousands nodes) so I think it's better to work on the data.frame directly.

Upvotes: 1

Views: 529

Answers (1)

akrun
akrun

Reputation: 887691

Perhaps, we can use the igraph

library(igraph)
g1 <- graph.data.frame(df[2:1])
as.numeric(subcomponent(g1, 2, mode = 'out'))
#[1]  2  4  5  6  9 10

Upvotes: 1

Related Questions