cianius
cianius

Reputation: 2412

Plot a decision tree with R

I have a 440*2 matrix that looks like:

1   144
1   152
1   135
2   3
2   12
2   107
2   31
3   4
3   147
3   0
4   end
4   0
4   0
5   6
5   7
5   10
5   9

The left column are the starting points eg in the app all the 1's on the left would be on the same page. They lead to three choices, pages 144,152,135. These pages can each then lead to another page, and so on until the right hand column says 'end'. What I would like is a way to visualise the scale of this tree. I realise it will be quite large given the nb of rows so maybe not graph friendly, so for clarity I want to know how many possible routes there are in total (from every start point, down every option it gives and the end destinations of each. I realise there will be overlaps but thats why I am finding this hard to calculate).

secondly, each number has an associated title. I would like to have a function whereby if you input a given title it will plot all the possible starting points and their associated paths that will lead there. This should be a lot smaller and therefore graph friendly.

e.g.

dta <- "
14  12  as
186  187  Frac
187  154  Low
23   52   Med
52   11   Lip
15  55  asd
11   42   AAA
42   154   BBB
154   end  Coll"

Edited example data to show that some branches are not connected to desired tree

  dta <- "
  14  12  as
  186  187  Frac
  187  154  Low
  23   52   Med
  52   11   Lip
  11   42   AAA
  42   154   BBB
  154   end  Coll"

 dta <- gsub("   ", ",", dta, fixed = TRUE)   
 dta <- gsub("  ", ",", dta, fixed = TRUE)

df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")
library(data.tree)
Warning message:
package ‘data.tree’ was built under R version 3.2.5
tree <- FromDataFrameNetwork(df)
 **Error in FromDataFrameNetwork(df) :**
  **Cannot find root name. network is not a tree!**

I made this example to show how column 1 leads to a value in column 2 which then refers to a value in column 1 until you reach the end. Different starting points can ultimately lead to different length paths to same destination. so this would look sometigng like: All paths leads to Coll

So here, I wanted to see how you could go from all start points to 'Coll'

greatly appreciate any help

Upvotes: 2

Views: 3029

Answers (1)

Christoph Glur
Christoph Glur

Reputation: 1244

If you have indeed a tree (e.g. no cycles), you can use data.tree:

Start by converting to a data.frame:

 dta <- "
14  12  as
186  187  Frac
187  154  Low
23   52   Med
52   11   Lip
15  55  asd
11   42   AAA
42   154   BBB
154   end  Coll
55  end  efg
12  end  hij"

dta <- gsub("   ", ",", dta, fixed = TRUE)
dta <- gsub("  ", ",", dta, fixed = TRUE)


df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")

Now, convert to a data.tree:

library(data.tree)
tree <- FromDataFrameNetwork(df)

tree$leafCount

You can now navigate to any sub-tree, for analysis and plotting. E.g. using any of the following possibilities:

subTree <- tree$FindNode(187)
subTree <- Climb(tree, nme = "Coll", nme = "Low")
subTree <- tree$`154`$`187`

subTree <- Clone(tree$`154`)

Maybe printing is all you need:

print(subTree , "nme")

This will print like so:

  levelName          nme
1 154                Coll
2  ¦--187             Low
3  ¦   °--186        Frac
4  °--42              BBB
5      °--11          AAA
6          °--52      Lip
7              °--23  Med

Otherwise, use fancy plotting:

SetNodeStyle(subTree , style = "filled,rounded", shape = "box", fontname = "helvetica", label = function(node) node$nme, tooltip = "name")
plot(subTree , direction = "descend")

This looks like this:

enter image description here

Upvotes: 3

Related Questions