Reputation: 43
I was wondering when we use the Bagging to do the classification, what parameters can be tuned and can we use the cross-validation to tune it?
In the Bagging function in R, it says we can use the nbagg to change the number of bootstrap replications. And use rpart.control.
Here's my code
bagging(income ~., data = training3, coob= T)
Upvotes: 1
Views: 4078
Reputation: 11128
When to use bagging in classification?
Bagging is essentially taking repeated samples from the single training set in order to generate x number of different bootstrapped training data sets. We then train our method on these training sets and average all the predictions in case of regression, we use voting in case of classification. Bagging helps in reducing the variance of an outcome so in cases where your you have a very high variance in your results you often choose bagging, often bagging techniques are outperformed by random forest and boosting.
What parameters can be tuned and can we use the cross-validation to tune it ?
nbagg: The nbagg parameter is used to control the number of decision trees voting in the ensemble (with a default value of 25). Depending on the difficulty of the learning task and the amount of training data, increasing this number may improve the model's performance but it rquires additional computational expense.
cp: cp is the complexity parameter when tuned properly gives a pruned tree, The complexity parameter (cp) is used to control the size of the decision tree and to select the optimal tree size. In cases where the cost of adding another variable to the decision tree from the current node is above the value of cp, then tree building does not continue, you can hit and trial to plot it and see what fits for you.
Can we use cross validation?
Yes you can, you should use however caret package to do it(just to make your life simple)
library(caret)
set.seed(1729)
cntrl <- trainControl(method = "cv", number = 10)
train(dependent_variable ~ ., data = mydata, method = "treebag",
trControl = cntrl)
Upvotes: 3