stackoverflowuser2010
stackoverflowuser2010

Reputation: 40969

Are decision trees (e.g. C4.5) considered nonparametric learning?

I am relatively new to machine learning and am trying to place decision tree induction into the grand scheme of things. Are decision trees (for example, those built with C4.5 or ID3) considered parametric or nonparametric? I would guess that they may be indeed parametric because the decision split points for real values may be determined from some distribution of features values, for example the mean. However, they do not share the nonparametric characteristic of having to keep all the original training data (like one would do with kNN).

Upvotes: 9

Views: 8030

Answers (2)

marc
marc

Reputation: 355

The term parametric refers to the relation between the number of parameters of the model and the data.

If the number of parameters is fixed, the model is parametric.

If the number of parameters grows with the data, the model is non parametric.

A decision tree is non parametric but if you cap its size for regularization then the number of parameters is also capped and could be considered fixed. So it's not that clear cut for decision trees.

KNN is definitely non parametric because the parameter set is the data set: to predict new data points the KNN model needs to have access to the training data points and nothing else (except hyper-parameter K).

Upvotes: 5

bogatron
bogatron

Reputation: 19179

The term "parametric" refers to parameters that define the distribution of the data. Since decision trees such as C4.5 don't make an assumption regarding the distribution of the data, they are nonparametric. Gaussian Maximum Likelihood Classification (GMLC) is parametric because it assumes the data follow a multivariate Gaussian distribution (classes are characterized by means and covariances). With regard to your last sentence, retaining the training data (e.g., instance-based learning) is not common to all nonparametric classifiers. For example, artificial neural networks (ANN) are considered nonparametric but they do not retain the training data.

Upvotes: 10

Related Questions