Reputation: 8437
I would like to create a flat data.frame
from a tree in R.
The tree is represented by a list which each contains a key called children
which contains more lists with more children.
tree <-
list(name="root",
parent_name='None',
children=list(
list(parent_name="root", name="child1", children=list()),
list(parent_name="root", name="child2", children=list(list(parent_name="child2", name="child3", children=c())))
)
)
I would like to "flatten" this down into a data.frame
with the following structure:
name parent_name
1 root None
2 child1 root
3 child2 root
4 child3 child2
I can accomplish this using the following recursive function:
walk_tree <- function(node) {
results <<- rbind(
results,
data.frame(
name=node$name,
parent_name=node$parent_name,
stringsAsFactors=FALSE
)
)
for (node in node$children) {
walk_tree(node)
}
}
This function works fine but requires me to declare a results
data.frame
outside of the function:
results <- NULL
walk_tree(tree)
results # now contains the data.frame as desired
Furthermore, the use of the <<-
operator causes the following warning to occur when the walk_tree
function is included as a function in a package:
Note: no visible binding for '<<-' assignment to 'results'
Using the <-
operator doesn't (results
evaluates to NULL
after running walk_tree
).
What is the correct way to recursively build a data.frame
from a tree in R?
Upvotes: 4
Views: 727
Reputation: 79338
rev(data.frame(matrix(stack(tree)[,1],,2,T)))#MHHH seems too easy for the task
X2 X1
1 None root
2 child1 root
3 child2 root
4 child3 child2
stack(tree)%>%
mutate(new=rep(1:(n()/2),each=2),ind=rep(ind[2:1],n()/2))%>%
spread(ind,values)
new name parent_name
1 1 None root
2 2 child1 root
3 3 child2 root
4 4 child3 child2
Upvotes: 0
Reputation: 1877
You could use the excellent tree structure from the ape
package and write your data in parenthetic format (were commas (,
) represent a vertex and brackets represent edges and your leaves are the "children" - the tree is ended by a semi-colon (;
)).
## Reading a tree
my_tree <- "(child1, (child2, child3));"
tree <- ape::read.tree(text = my_tree)
## Getting the edge table (your flatten format)
tree$edge
# [,1] [,2]
#[1,] 4 1
#[2,] 4 5
#[3,] 5 2
#[4,] 5 3
Where 4
is your root
(the deepest vertex in the tree (number of leaves + 1)). It connects "child1"
to the vertex 5
. 5
denotes the first vertex linking "child2"
and "child3"
.
You can visualise this structure as follow (S3 plot methods for phylo
)
## Plotting the tree
plot(tree)
ape::nodelabels()
You can add extra structures (trees) to any child as follows:
child1_children <- ape::read.tree(text = "(child4, (child5, child6));")
## Adding child1_children to the first leave
tree2 <- ape::bind.tree(tree, child1_children, where = 1)
## Plotting the tree
plot(tree2)
ape::nodelabels()
tree2$edge
# [,1] [,2]
#[1,] 6 7
#[2,] 7 3
#[3,] 7 8
#[4,] 8 4
#[5,] 8 5
#[6,] 6 9
#[7,] 9 1
#[8,] 9 2
Or remove some using the same principle with ape::drop.tip
.
Upvotes: -1
Reputation: 389285
One way is to gather all the nodes with "names" and "parent_name" together and make a dataframe with them.
#Flatten the nested structure
u_tree <- unlist(tree)
#Gather all the indices where name of the node is equal to parent_name
inds <- grepl("parent_name$", names(u_tree))
#Add them in a dataframe
data.frame(name = u_tree[!inds], parent_name = u_tree[inds])
# name parent_name
# root None
#2 child1 root
#3 child2 root
#4 child3 child2
Upvotes: 3
Reputation: 47350
You were not far :), using dplyr::bind_rows
walk_tree <- function(node) {
dplyr::bind_rows(
data.frame(
name=node$name,
parent_name=node$parent_name,
stringsAsFactors=FALSE),
lapply(node$children,walk_tree)
)
}
walk_tree(tree)
name parent_name
1 root None
2 child1 root
3 child2 root
4 child3 child2
and the base R version :
walk_tree <- function(node) {
do.call(
rbind,
c(
list(data.frame(
name=node$name,
parent_name=node$parent_name,
stringsAsFactors=FALSE)),
lapply(node$children,walk_tree)
))
}
walk_tree(tree)
Upvotes: 1