Austin Richardson
Austin Richardson

Reputation: 8437

Correct way to recursively create a data.frame from a tree

I would like to create a flat data.frame from a tree in R.

The tree is represented by a list which each contains a key called children which contains more lists with more children.

tree <-
  list(name="root",
       parent_name='None',
       children=list(
         list(parent_name="root", name="child1", children=list()),
         list(parent_name="root", name="child2", children=list(list(parent_name="child2", name="child3", children=c())))
       )
      )

I would like to "flatten" this down into a data.frame with the following structure:

    name parent_name
1   root        None
2 child1        root
3 child2        root
4 child3      child2

I can accomplish this using the following recursive function:

walk_tree <- function(node) {
  results <<- rbind(
    results,
    data.frame(
      name=node$name,
      parent_name=node$parent_name,
      stringsAsFactors=FALSE
    )
  )

  for (node in node$children) {
    walk_tree(node)
  }
}

This function works fine but requires me to declare a results data.frame outside of the function:

results <- NULL
walk_tree(tree)
results # now contains the data.frame as desired

Furthermore, the use of the <<- operator causes the following warning to occur when the walk_tree function is included as a function in a package:

Note: no visible binding for '<<-' assignment to 'results'

Using the <- operator doesn't (results evaluates to NULL after running walk_tree).

What is the correct way to recursively build a data.frame from a tree in R?

Upvotes: 4

Views: 727

Answers (4)

Onyambu
Onyambu

Reputation: 79338

rev(data.frame(matrix(stack(tree)[,1],,2,T)))#MHHH seems too easy for the task
      X2     X1
1   None   root
2 child1   root
3 child2   root
4 child3 child2

stack(tree)%>%
mutate(new=rep(1:(n()/2),each=2),ind=rep(ind[2:1],n()/2))%>%
spread(ind,values)
  new   name parent_name
1   1   None        root
2   2 child1        root
3   3 child2        root
4   4 child3      child2

Upvotes: 0

Thomas Guillerme
Thomas Guillerme

Reputation: 1877

You could use the excellent tree structure from the ape package and write your data in parenthetic format (were commas (,) represent a vertex and brackets represent edges and your leaves are the "children" - the tree is ended by a semi-colon (;)).

## Reading a tree
my_tree <- "(child1, (child2, child3));"
tree <- ape::read.tree(text = my_tree)

## Getting the edge table (your flatten format)
tree$edge
#     [,1] [,2]
#[1,]    4    1
#[2,]    4    5
#[3,]    5    2
#[4,]    5    3

Where 4 is your root (the deepest vertex in the tree (number of leaves + 1)). It connects "child1" to the vertex 5. 5 denotes the first vertex linking "child2" and "child3". You can visualise this structure as follow (S3 plot methods for phylo)

## Plotting the tree
plot(tree)
ape::nodelabels()

You can add extra structures (trees) to any child as follows:

child1_children <- ape::read.tree(text = "(child4, (child5, child6));")
## Adding child1_children to the first leave
tree2 <- ape::bind.tree(tree, child1_children, where = 1)
## Plotting the tree
plot(tree2)
ape::nodelabels()
tree2$edge
#     [,1] [,2]
#[1,]    6    7
#[2,]    7    3
#[3,]    7    8
#[4,]    8    4
#[5,]    8    5
#[6,]    6    9
#[7,]    9    1
#[8,]    9    2

Or remove some using the same principle with ape::drop.tip.

Upvotes: -1

Ronak Shah
Ronak Shah

Reputation: 389285

One way is to gather all the nodes with "names" and "parent_name" together and make a dataframe with them.

#Flatten the nested structure
u_tree <- unlist(tree)

#Gather all the indices where name of the node is equal to parent_name
inds <- grepl("parent_name$", names(u_tree))

#Add them in a dataframe
data.frame(name = u_tree[!inds], parent_name = u_tree[inds])

#    name parent_name
#    root        None
#2 child1        root
#3 child2        root
#4 child3      child2

Upvotes: 3

moodymudskipper
moodymudskipper

Reputation: 47350

You were not far :), using dplyr::bind_rows

walk_tree <- function(node) {
  dplyr::bind_rows(
    data.frame(
      name=node$name,
      parent_name=node$parent_name,
      stringsAsFactors=FALSE),
    lapply(node$children,walk_tree)
  )
}

walk_tree(tree)

    name parent_name
1   root        None
2 child1        root
3 child2        root
4 child3      child2

and the base R version :

walk_tree <- function(node) {
  do.call(
    rbind,
    c(
    list(data.frame(
      name=node$name,
      parent_name=node$parent_name,
      stringsAsFactors=FALSE)),
    lapply(node$children,walk_tree)
  ))
}

walk_tree(tree)

Upvotes: 1

Related Questions