Sourav Saha
Sourav Saha

Reputation: 101

Linear Regression vs Random Forest performance accuracy

If the dataset contains features some of which are Categorical Variables and some of the others are continuous variable Decision Tree is better than Linear Regression,since Trees can accurately divide the data based on Categorical Variables. Is there any situation where Linear regression outperforms Random Forest?

Upvotes: 6

Views: 13771

Answers (2)

user2564741
user2564741

Reputation: 76

Key advantages of linear models over tree-based ones are:

  • they can extrapolate (e.g. if labels are between 1-5 in train set, tree based model will never predict 10, but linear will)
  • could be used for anomaly detection because of extrapolation
  • interpretability (yes, tree based models have feature importance, but it's only a proxy, weights in linear model are better)
  • need less data to get good results
  • have strong online learning implementations (Vowpal Wabbit), which is crucial to work with giant datasets with a lot of features (e.g. texts)

Upvotes: 3

kutschkem
kutschkem

Reputation: 8163

There for sure have to be situations where Linear Regression outperforms Random Forests, but I think the more important thing to consider is the complexity of the model.

Linear Models have very few parameters, Random Forests a lot more. That means that Random Forests will overfit more easily than a Linear Regression.

Upvotes: 4

Related Questions