Decision tree entropy calculation target

Question

I found several examples of two types.

Single feature

Given a data with only two items classes. For example only blue and yellow balls. I.e. we have only one feature in this case is color. This is clear example to show "divide and conquer" rule applicable to entropy. But this is senselessly for any prediction or classification problems because if we have an object with only one feature and the value is known we don't need a tree to decide that "this ball is yellow".

Multiple features

Given a data with multiple features and a feature to predict (known for training data). We can calculate a predicate based on minimum average entropy for each feature. Closer to life, isn't it? It was clear to me until I haven't tried to implement the algorithm.

And now I have a collision in my mind.

If we calculate entropy relatively to a known features (one per node) we will have meaningful results at classification with a tree only if unknown feature is strictly dependent from every known feature. Otherwise a single unbound known feature could break all prediction driving a decision in a wrong way. But if we calculate entropy relatively to a values of the feature which we want to predict at classification we are returned to the first senseless example. In this way there is no difference which of a known feature to use for a node...

And a question about a tree building process.

Should I calculate entropy only for known features and just believe that all the known features are bound to an unknown? Or maybe I should calculate entropy for unknown feature (known for training data) TOO to determine which feature more affects result?

Decision tree entropy calculation target

Single feature

Multiple features

Answers (1)

Related Questions