avinj86
avinj86

Reputation: 45

How can I specify splits in decision tree?

I am trying to train a decision tree classifier for evaluating baseball players using scikit-learn's provided function. However, I would like to "pre-specify" or "force" some splits ahead of time, based on what I know to be true about the way experts think (these need to be incorporated regardless). For example, I want to force a split based on batting average > .300.

A related question is --can I "pre-load" a previously trained decision tree model and merely "update" it in a subsequent training? Or does the decisio tree classifier need to re-learn all the rules each time I run it? The analogy I'm trying to make here is to transfer learning, but applying it decision trees.

Upvotes: 2

Views: 4023

Answers (1)

Craig
Craig

Reputation: 346

The way that I pre-specify splits is to create multiple trees. Separate players into 2 groups, those with avg > 0.3 and <= 0.3, then create and test a tree on each group. During scoring, a simple if-then-else can send the players to tree1 or tree2.

The advantage of this way is your code is very explicit. It is also a good way to test these experts rules - build a single tree without the rule, then build 2 trees and compare.

The disadvantage is if you have many rules this becomes quite burdensome with many trees, many if-then-else to maintain and maybe small samples training each tree. But maybe all of the experts rules are not optimal.

Upvotes: 1

Related Questions