Reputation: 1996
I hope you can help
I've been trying to tune my random forest model using the randomized search function in scikit learn.
As below, I have given the option of several max depths & several leaf samples.
# Create a based model
model = RandomForestClassifier()
# Instantiate the random search model
best = RandomizedSearchCV(model, {
'bootstrap': [True, False],
'max_depth': [80, 90, 100, 110],
'min_samples_leaf': [3, 4, 5]
}, cv=5, return_train_score=True, iid=True, n_iter = 4)
best.fit(train_features, train_labels.ravel())
print(best.best_score_)
print(best)
But when I run this, I get the below, where max depth and the min samples per leaf are set to values not in my array.
What am I doing wrong here?
RandomizedSearchCV(cv=5, error_score='raise',
estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
**max_depth=None**, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
**min_samples_leaf=1**, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False),
fit_params=None, iid=True, n_iter=4, n_jobs=1,
param_distributions={'bootstrap': [True, False], 'max_depth': [80, 90, 100, 110], 'min_samples_leaf': [3, 4, 5]},
pre_dispatch='2*n_jobs', random_state=None, refit=True,
return_train_score=True, scoring=None, verbose=0)
Upvotes: 1
Views: 4411
Reputation: 60321
Your chosen name for your RandomizedSearchCV
object, best
, is actually a misnomer: best
will contain all the parameters, and not only the best ones, including the parameters of your RF model, some of which will be actually overridden during rhe randomized search. So, print(best)
, as expected, gives exactly this result, i.e. all the parameter values, including the default ones of RF which will actually not be used here (they will be overridden by the values in parameters
).
What you should ask instead is
print(best.best_params_)
for the best found parameters, and
print(best.best_estimator_)
for the whole RF model with the best parameters found.
Here is a reproducible example using the iris data (and the name clf
instead of best
):
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.model_selection import RandomizedSearchCV
iris = datasets.load_iris()
parameters = {
'bootstrap': [True, False],
'max_depth': [80, 90, 100, 110],
'min_samples_leaf': [3, 4, 5]
}
model = RandomForestClassifier()
clf = RandomizedSearchCV(model, parameters, cv=5, return_train_score=True, iid=True, n_iter = 4)
clf.fit(iris.data, iris.target)
Notice that, the default console output of this last fit
command, even without any print
request, will be:
RandomizedSearchCV(cv=5, error_score='raise-deprecating',
estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False),
fit_params=None, iid=True, n_iter=4, n_jobs=None,
param_distributions={'max_depth': [80, 90, 100, 110], 'bootstrap': [True, False], 'min_samples_leaf': [3, 4, 5]},
pre_dispatch='2*n_jobs', random_state=None, refit=True,
return_train_score=True, scoring=None, verbose=0)
which is essentially the same with the one you report (and I have explained above): just the default values of your RF model (since you have not specified any parameters for model
), plus the parameters
grid. To get the specific parameter set selected you should use
clf.best_params_
# {'bootstrap': True, 'max_depth': 90, 'min_samples_leaf': 5}
and asking for clf.best_estimator_
confirms indeed that we get an RF with these exact parameter values:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=90, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=5, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
Upvotes: 5