A_D
A_D

Reputation: 705

Maximum Likelihood Estimation in sklearn

In my project, I am using GridSearchCV in sklearn to exhaustively search over specified parameter values for a model to find the best possible parameter values. I just test that in RandomForestClassifier and helped me to find the best max_depth and n_estimators. Based on that, I have two questions:

  1. Does GridSearchCV use the concept of Maximum Likelihood Estimation (MLE) under the hood?
  2. Instead of using GridSearchCV for every model, is there a technique that I can use to choose the best model for my dataset? I think this is under the concept of model selection but I don't know how to use it via sklearn.

Thank you

Upvotes: 1

Views: 10805

Answers (1)

lejlot
lejlot

Reputation: 66805

Does GridSearchCV use the concept of Maximum Likelihood Estimation (MLE) under the hood?

MLE is probabilistic reasoning, thus it can be only applied to probabilistic models. GridSearchCV is not MLE based, it is a simple trick to do model selection based on direct estimation of the test error.So given a particular model, it can assign a number which represents how good it is - given many models, you can simply select the one with the biggest number (highest estimated generalization strength).

Instead of using GridSearchCV for every model, is there a technique that I can use to choose the best model for my dataset? I think this is under the concept of model selection but I don't know how to use it via sklearn.

There are plenty, however sklearn pretty much implements only various train-test splitters (CV, random etc.); instead you might want to consider other libraries that support:

  • Bayesian optimization
  • Tree of Parzen Estimators technique

Which are more advanced methods of looking for good hyperparamters (rather than just checking already existing ones).

Upvotes: 3

Related Questions