Reputation: 523
I am trying to change the way that random forest algorithm using in subsetting features for every node. The original algorithm as it is implemented in Scikit-learn way is randomly subsetting. I want to define which subset for every new node from several choices of several subsets. Is there direct way in scikit-learn to control such method? If not, is there any way to update the same code of Scikit-learn? If yes, which function in the source code is what you think should be updated?
Upvotes: 2
Views: 822
Reputation: 2487
Short version: This is all you.
I assume by "subsetting features for every node" you are referring to the random selection of a subset of samples and possibly features used to train individual trees in the forest. If that's what you mean, then you aren't building a random forest; you want to make a nonrandom forest of particular trees.
One way to do that is to build each DecisionTreeClassifier
individually using your carefully specified subset of features, then use the VotingClassifier
to combine the trees into a forest. (That feature is only available in 0.17/dev, so you may have to build your own, but it is super simple to build a voting classifier estimator class.)
Upvotes: 1