Havsula
Havsula

Reputation: 43

Force Split random forest

In scikit using random forest. Is it possible to force a split for a certain binary feature. I have a dataset where one of the feature is man or woman. I have found out that they differ so much that the first split should be on sex. I can of course make to models, but it have been practical with one model.

Upvotes: 1

Views: 1614

Answers (2)

Edden Gerber
Edden Gerber

Reputation: 51

What the original post suggests may actually be a good idea in some cases. Since Random Forest splits greedily based on the most informative features, it may under-perform a model that first splits on a feature that is less informative but that separates the data into two sets that behave differently in a way that justifies different models.

This is a video that demonstrates exactly this - it is in Hebrew but if you follow the on-screen notebook you can see how they show it: https://www.youtube.com/watch?v=LAJW18ITymM

(tl;dr - a simple Decision Tree gives a classification accuracy of 0.74 but when split on the 4th most important feature into two separate trees, they each give a 0.85 accuracy)

Upvotes: 1

Chris
Chris

Reputation: 967

In short, No.

However, your question suggests you do not fully understand how a Random Forest works.

I suggest reading https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/

The splits in the data are done in a way to maximise variance, between the splits. As such, if the feature you mention is truely predictive, the trees should split on that feature at some point (depending on prediction power of other features).

Additionally, all tree models in sklearn have the feature to export the splits - as such you can fit a tree and check what is happening.

http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html

Chapter 9 in The Elements of Statistical Learning (which is available for free download on the authors website) covers the theory in greater depth if you wish to know more.

Upvotes: 1

Related Questions