Tyler
Tyler

Reputation: 1050

How to get percentages from decision tree for each node

How could I create a table that includes the percentages for each node in the plot below?

library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)

fit <- rpart(Species ~ ., data=iris, method="class")
fancyRpartPlot(fit)

It results in this plot:

image

I would like to output a table with species as the first column and the associated percent at each node in a second column. A second iteration of the table would exclude the first node (100%) and also remove duplicates by retaining the row that contains a higher percentage.

After picking through the "rpart" documentation I'm still unable to figure out how to create this table. Please let me know what you think.

Thank you for your time.

Upvotes: 2

Views: 2484

Answers (1)

IRTFM
IRTFM

Reputation: 263391

The where element of the rpart-object is the predicted class for the terminal nodes. You can get this in a table with:

> iris$where <- fit$where
> with(iris, table(Species, where))
            where
Species       2  4  5
  setosa     50  0  0
  versicolor  0 49  1
  virginica   0  5 45

I'm guessing you want the column sums divided by the total counts?

> 100*colSums(with(iris, table(Species, where)) )/150
       2        4        5 
33.33333 36.00000 30.66667 

Upvotes: 1

Related Questions