Reputation: 314
I'm using the "party" package to create random forest of regression trees. I've created a ForestControl class in order to limit my number of trees (ntree), of nodes (maxdepth) and of variables I use to fit a tree (mtry). One thing I'm not sure of is if the cforest algo is using subsets of my training set for each tree it generates or not.
I've seen in the documentation that it is bagging so I assume it should. But I'm not sure to understand well what the "subset" input is in that function.
I'm also puzzled by the results I get using ctree: when plotting the tree, I see that all my variables of my training set are classified in the different terminal tree nodes while I would have exepected that it only uses a subset here too.
So my question is, is cforest doing the same thing as ctree or is it really bagging my training set?
Thanks in advance for you help!
Ben
Upvotes: 3
Views: 532
Reputation: 47
If mtry
is set to the number of variables in the dataset (or to Inf
) cforest
performs 'bagging' (bootstrap aggregation, cforest documentation). Bagging can be used to overcome the limitation of decision trees that they are sensitive to additional training data (QuantAcademy Bootstrap Aggregation, Random Forests and Boosted Trees). It does this by aggregating over bootstrap samples of the input training data. Since ctree
is a type of decision tree that uses a conditional inference framework to reduce bias, it presumably is nonetheless also limited by being a "high-variance estimator" 2. The ensemble of trees is aggregated over to produce the prediction. However, each individual tree is based on all of the input features.
Otherwise, (if mtry
is less than the number of variables) cforest
creates a random forest based on conditional inference trees 1. The random forest extends the bagging to bagged feature selection over a (typically) large number of trees, before once again aggregating the results from individual trees. Each tree is based on a random feature selection with random subsets of training data.
Upvotes: 0