Reputation: 89
I'm new to R and rpart package. I want to create a tree using the following sample data.
My data set is similar to this mydata =
"","A","B","C","status"
"1",TRUE,TRUE,TRUE,"okay"
"2",TRUE,TRUE,FALSE,"okay"
"3",TRUE,FALSE,TRUE,"okay"
"4",TRUE,FALSE,FALSE,"notokay"
"5",FALSE,TRUE,TRUE,"notokay"
"6",FALSE,TRUE,FALSE,"notokay"
"7",FALSE,FALSE,TRUE,"okay"
"8",FALSE,FALSE,FALSE,"okay"
fit <- rpart(status ~ A + B + C, data = mydata, method = "class")
or I tried with different formulas and different methods in this. But always only the root node is produced. no plot possible. its showing
fit
n= 8
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 8 3 okay (0.3750000 0.6250000) *
How to create the tree.? I need to show percentage of "okay" and "notokay" on each node. and i need to specify one out of A, B or C for spliting and show the statistics
Upvotes: 0
Views: 2205
Reputation: 17168
With the default settings of rpart()
no splits are considered at all. The minsplit
parameter is 20
by default (see ?rpart.control
) which is "the minimum number of observations that must exist in a node in order for a split to be attempted." So for your 8 observations no splitting is even considered.
If you are determined to consider splitting, then you could decrease the minbucket
and/or minsplit
parameters. For example
fit <- rpart(status ~ A + B + C, data = mydata,
control = rpart.control(minsplit = 3))
produces the following tree:
The display is created by
plot(partykit::as.party(fit), tp_args = list(beside = TRUE))
and the print output from rpart
is:
n= 8
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 8 3 okay (0.3750000 0.6250000)
2) A=FALSE 4 2 notokay (0.5000000 0.5000000)
4) B=TRUE 2 0 notokay (1.0000000 0.0000000) *
5) B=FALSE 2 0 okay (0.0000000 1.0000000) *
3) A=TRUE 4 1 okay (0.2500000 0.7500000) *
Whether or not this is particularly useful is a different question though...
Upvotes: 1