Reputation: 41
how can we specify the parameter 'minsplit=' using 'rpart' packages to perform decision tree.
rpart(myFormula, data=train, control=rpart.control(minsplit=10))
Upvotes: 4
Views: 11007
Reputation: 211
minsplit :- the minimum number of observations that must exist in a node in order for a split to be attempted. (https://stat.ethz.ch/R-manual/R-devel/library/rpart/html/rpart.control.html)
You can overwrite the minsplit control parameter by specifying a value of your own. But be aware that this could lead to an over fitting decision tree. For an example if you have very few data points that is not enough to create a tree with RPART's default parameters set; then you can adjust the value of minsplit, minbucket to create a tree.
You can decide the value after looking at you data set.
RPART's default values :- minsplit = 20, minbucket = round(minsplit/3)
tree <- rpart(outcome ~ .,method = "class",data = data,control =rpart.control(minsplit = 1,minbucket=1, cp=0))
Upvotes: 4