Mark
Mark

Reputation: 1769

Reducing the number of nodes of a tree, to obtain nodes with more than one child node

The following tree:

enter image description here

has been obtained from the following matrix

> mat
7  23 47 41 31
7  23 53 41 31
7  23 53 41 37
7  29 47 41 31
7  29 47 41 37
7  29 53 41 31
7  29 53 41 37
11 29 53 41 31
11 29 53 41 37

taking each columns of 'mat' as a level of the tree. If 'data' is the dataframe where the matrix 'mat' is stored

V1 V2 V3 V4 V5
7  23 47 41 31
7  23 53 41 31
7  23 53 41 37
7  29 47 41 31
7  29 47 41 37
7  29 53 41 31
7  29 53 41 37
11 29 53 41 31
11 29 53 41 37

the code that produces above tree is the following

> data$pathString<-paste("0", data$V1,data$V2,data$V3,data$V4,data$V5,sep = "/")
> p_tree <- as.Node(data)
> export_graph(ToDiagrammeRGraph(p_tree), "tree.png")

I would like to modify the tree as follows: (1) if a node at level 'n', labelled by number x, has only one child node at level 'n+1', labelled by number y, then the program brings together these two nodes in one node labelled by the result of the product x*y; 2) if the node at level 'n+1' does not have child nodes, the program does nothing and starts again from another branch; 3) if the node at level 'n+1' has more than one child node, the program apply point (1) and starts again from each of child nodes.

For example, for the tree of our example, the code should:

enter image description here

Upvotes: 10

Views: 518

Answers (1)

Try this:

  freq <- sapply(1:ncol(data), function(x) {
  df <- data[, 1:x, drop = FALSE]

  cc <- aggregate(df[, 1], as.list(df), FUN = length)
  merge(df, cc, by = colnames(df), sort = FALSE)[, "x"]
  })

data$pathString <- sapply(1:nrow(data), function(x) {
  g <- 1
  for(i in 2:ncol(freq)) g <- c(g, 
        if(freq[x, i] == freq[x, i - 1]) g[i - 1] else g[i - 1] + 1)

  paste0(c("0", tapply(unlist(data[x, , drop = TRUE]), g, prod)), collapse = "/")
})


p_tree <- as.Node(data)

plot(p_tree)

enter image description here

Upvotes: 7

Related Questions