HammockKing
HammockKing

Reputation: 77

Rapidminer: Explaining decision tree parameters

I am very new to rapidminer and data mining in general but I have attempted to make a cursory search for what all of the parameters mean in rapidminers decision tree parameters and came up lacking. I know what a leaf is and a node and am at the point of getting my head around a few of the parameters but any knowledge shared would be appreciated. I.E. What does they all really do? criterion minimal size for split minimal leaf size minimal gain maximal depth confidence

Also without using optimization, is trail an error the best way to get the best prediction? Thanks, S

Upvotes: 3

Views: 4319

Answers (1)

ahoffer
ahoffer

Reputation: 6526

I like to use the RAPIDMINER OPERATOR REFERENCE. It is a a PDF file available here: http://rapidminer.com/documentation/

The information is in this document is better than the information in the application itself. For example: ...there are less than a certain number of instances or examples in the current subtree. This can be adjusted by using the minimal size for split parameter.

Let's say your labels are "blue", "red" and "green". You decision tree has a node with 2 "green" and 1 "blue" examples. If minimal size for split is 4, then the decision tree will not create a new branch because there are only three examples in the node. It will just accept the fact that even though the answer isn't perfect, it will declare the node to be leaf that classifies examples as all "green".

minimal leaf size is similar. A decision tree where every branch leads to a single example is not very useful even though it might provide the most accurate classification. Therefore you can set the minimum number of examples classified by a leaf in the tree. A good value depends on your data set and your needs. Run the decision tree and if there are too many leaves with only a a few examples in each run, increase the value of this parameter.

criterion and minimal gain are a little more complicated. Criterion is the algorithm how RapdMinder will use to judge how good a decision tree and its nodes are. There are several strategies and I am do not know much about how they work. The criterion is one of the things RapidMiner uses to decide if it should create a sub-tree under a node, or declare the node to be a leaf. It should also control how many branches a sub-tree extend from the sub-tree's root node.

There are more options for decision trees, and each kind of decision tree can have different parameters. I learned about them by reading the description of a parameter, hypothesizing what would happen if I changed the parameter, and then creating a new decision tree to see if my hypothesis was right. Experiment and have fun!

Upvotes: 1

Related Questions