Edison
Edison

Reputation: 11987

Configuration of GridSearchCV for AdaBoost and its base learner

I'm running grid search on AdaBoost with DecisionTreeClassifier as its base learner to get the best parameters for AdaBoost and DecisionTree.

The search on a dataset (130000, 22) has been running for 18 hours so I'm wondering if it's just another typical day of waiting for training or maybe there might be an issue with the set up.

Is the base-learner, grid search, training and params set up correctly?

ada_params = {"base_estimator__criterion" : ["gini", "entropy"],
              "base_estimator__splitter" :   ["best", "random"],
              "base_estimator__min_samples_leaf": [*np.arange(100,1500,100)],
              "base_estimator__max_depth": [5,10,13,15],
              "base_estimator__max_features": [5,10,15],
              "n_estimators": [500, 700, 1000, 1500],
              "learning_rate": [0.001, 0.01, 0.1, 0.3]
}

dt_base_learner = DecisionTreeClassifier(random_state = 42, max_features="auto", class_weight = "balanced")
ada_clf = AdaBoostClassifier(base_estimator = dt_base_learner)

ada_search = GridSearchCV(ada_clf, param_grid=ada_params, scoring = 'f1', cv=kf)
ada_search.fit(scaled_X_train, y_train)
    

Upvotes: 1

Views: 394

Answers (2)

Victor Villacorta
Victor Villacorta

Reputation: 617

Gridsearch will not finish until all joins are done, check the RandomSearchcv documentation and increase the joins a few at a time (n_iter) and put "-1" in "n_jobs" to parallelize as much as possible

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

Upvotes: 0

vogelstein
vogelstein

Reputation: 493

If I am not mistaken, your GridSearch tests 14 * 4 * 3 * 4 * 4 = 2,688 different model configuration, each for a crossvalidation of an unknown number of splits. You should definitely try to reduce the number of combinations in the GridSearchCV or go for RandomizedSearchCV or BayesSearchCV from skopt.

Upvotes: 0

Related Questions