Reputation: 3124
I am making datamining model by using decision tree. And if I have binary attribute like MALE and FEMALE I know I will have two branches from the Gender node when splitting. But what if I have continuous attribute which is float number from 0 to 1. Do I map it to discrete values like LOW (0 - 0.5) and HIGH (0.5 - 1)? Or is there some other way of doing it?
Upvotes: 0
Views: 941
Reputation: 56
As noted above, you don't need to worry about finding the best attribute. But if you have loads of floats as attributes computation time for each value is expensive. In that case have a look at discretization algorithms in decision trees.
So in your example when applying binary discretization techniques, your continuous attribute will be transformed into a discrete binary attribute. There are loads of discretizaion techniques, that use different approaches to find the perfect threshold between the classes (in your case 0.5).
Upvotes: 1
Reputation: 397
Why do you need to split it by your own? I'm not sure I understood correctly. However, the purpose of the decision trees is exactly what you seem to do by hand.
For a given feature F (Let's take the case of a continuous attribute), where values are within (a, b) (it can be ]-∞, +∞[ ), the decision tree looks for the best* value V to split your node into two separate leaves. Thus data belong to the first leaf if the attribute F is within (a, V), and to the second leaf if within (V, b)
What best* means?
They are multiple ways to find the value V but generally speaking, it is such that the purity (literature term) of each leaf is maximal, meaning that the data inside are somehow homogeneous. Wiki gives some commonly used metrics to split each parent leaf within two child leaves.
Upvotes: 1