Reputation: 3874
Jiawei Han's book on Data Mining 2nd edition (Attribute Selection Measures - pp 297 thru 300) explains how to calculate information gain achieved by each attribute (age, income, credit_rating) and class (buys_computer yes or no). In this example, each of the attribute values is discrete, for e.g. age can be youth/middle-aged/senior, income can be high/low/medium, credit_rating fair/excellent etc.
I would like to know how the same information gain can be applied to attributes which take non discrete data. For e.g. the income attribute takes any currency amount like 100.68, 120.90, etc etc. If there are 1000 students, there could be 1000 different amount values.
How can we apply the same information gain over non discrete data? Any tutorial/sample example/video url would be of great help.
Upvotes: 7
Views: 6574
Reputation: 66886
When your target variable is discrete (categorical), you just calculate entropy over the empirical distribution of categories in the left/right split you're considering, and compare their weighted average to the entropy without the split.
For a continuous target variable, like income, this is defined analogously as differential entropy. For your purpose you would assume that the values in your set have a normal distribution, and calculate the differential entropy accordingly. From Wikipedia:
That is it's just a function of the variance of the values. Note that this is in nats, not bits of entropy. To compare to Shannon entropy above, you'd have to convert, which is just a multiplication.
Upvotes: 8
Reputation: 496
Most common way for to do splitting for continuous variable (1d) is picking a threshold (from discretized set of thresholds, or you can choose a prior). So you can compute information gain for continuous value by first sorting it (you have to have an order) and then scanning it for the best value. http://dilekylmzr.files.wordpress.com/2011/09/data-mining_lecture9.ppt
Example of using this technique in random forests
Often this technique is used in random forests (or decision trees), so I will post few references to resources on that.
More information on random forests and this technique can be found here : http://www.cs.ubc.ca/~nando/540-2013/lectures.html . See lectures on youtube because slides are not very much informative. In the lecture it is described how to match body parts using random forests in Kinect, so it is quite interesting. Also you can look it up here : https://research.microsoft.com/pubs/145347/bodypartrecognition.pdf - the original paper being discussed in the lecture.
Note that for information gain you can use also gaussian entropy. It is basically fitting gaussian to data before and after split.
Upvotes: 6