Reputation: 733
I'm building a decision tree with rpart via the caret::train function. What I'm trying to do is to set the minsplit parameter of rpart equal to 1, in order to prune it afterwards with the cp. What I get from here is that the parameters should be passed in the ... of the train function. But this doesn't work. A minimal reproducible example:
mod1 <- train(Species ~ ., iris, method = "rpart", tuneGrid = expand.grid(cp = 0), minsplit=1)
mod2 <- rpart(Species ~ ., iris, cp=0, minsplit=1)
What I get is that mod1$finalModel and mod2 are quite different. I would like that mod1$finalModel was like mod2 (i.e., totally overfitted). I cannot pass the parameter either on the tuneGrid since it only accepts a cp column.
So my question is: is there anyway in caret to pass the argument minsplit=1 in the train function and then cross validate over the cp parameter?
Upvotes: 3
Views: 4200
Reputation: 161
I suppose that ‘control = rpart.control()’ is necessary for passing the arguments ‘minsplit’ and ‘minbucket’ within {caret} train-function as this would be the correct way in the rpart-function itself, to which the arguments are sent via ‘...’ of the {caret} train-function. Best, G
Upvotes: 0
Reputation: 733
Ok, thanks to this post I figured out how to do it:
mod1 <- train(Species ~ ., iris, method = "rpart",
control = rpart.control(minsplit = 1, minbucket = 1))
I'm still not quite sure why the argument has to be passed via control = rpart.control(). Passing just the arguments minsplit = 1, minbucket = 1 directly to the train function simply doesn't work.
Upvotes: 6