rohit choudhari
rohit choudhari

Reputation: 5

TPOT for hyperparameter tuning

I want to used TPOT for hyperparameter tunning of model. I know that TPOT can give me best machine learning pipeline with best hyperparameter. But in my case I have pipeline and I want to just tune its parameter

my pipeline is as follow

exported_pipeline = make_pipeline(
    StackingEstimator(estimator=SGDRegressor(alpha=0.001, eta0=0.1, fit_intercept=False, l1_ratio=1.0, learning_rate="constant", loss="epsilon_insensitive", penalty="elasticnet", power_t=10.0)),
    SelectPercentile(score_func=f_regression, percentile=90),
    OneHotEncoder(minimum_fraction=0.2, sparse=False, threshold=10),
    XGBRegressor(learning_rate=0.1, max_depth=10, min_child_weight=1, n_estimators=100, n_jobs=1, objective="reg:squarederror", subsample=0.45, verbosity=0)

please tell me way to do tunning of hyperparameter and if it is not possible in TPOT please tell some other possible alternative library for this. Thank you

Upvotes: 0

Views: 396

Answers (2)

Hao Li
Hao Li

Reputation: 26

  1. TPOT optimizes pipelines and hyperparams together. Since it is using genetic algorithm, you can run it several times with different random seeds to see if there is a better [pipeline with set of hyperparameters] together. Or, use different population settings

  2. If you don't want the pipeline to change. Import that in Sklearn and use something similar to TPOT. You can tune hyperparameters in Sklearns with Pipelines easily

Here is an example: https://medium.com/@kocur4d/hyper-parameter-tuning-with-pipelines-5310aff069d6 search for (ctrl F) "grid_params" and see how it is configurated -- and, you can even export the tune grid from TPOT to your pipeline

If the pipeline is not big ( and you have tune dictionaries ) use GridSearchCV.

If the pipeline is big or the hyperparameter space have a lot of options, maybe use https://sklearn-nature-inspired-algorithms.readthedocs.io/en/latest/introduction/nature-inspired-search-cv.html (NaturalInspiredSearchCV) this has similar grammar, and can use the 'runs' to configure parallel training. You can also modify the population settings to avoid it sink into local critical points.

Upvotes: 1

Peter VanWylen
Peter VanWylen

Reputation: 11

TPOT is only for searching for pipelines and tuning hyperparameters at the same time. If you have a pipeline and you just want to tune the parameters, try hyperopt, optuna, or GridSearchCV.

If you're somewhat flexible on the pipeline and you really want to use TPOT, you could always use a custom configuration like this and then set config_dict=custom_regression_config when calling TPOTRegressor:

custom_regression_config = {

    'xgboost.XGBRegressor': {
        'n_estimators': [100],
        'max_depth': range(1, 11),
        'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.],
        'subsample': np.arange(0.05, 1.01, 0.05),
        'min_child_weight': range(1, 21),
        'n_jobs': [1],
        'verbosity': [0],
        'objective': ['reg:squarederror']
    },

    'sklearn.linear_model.SGDRegressor': {
        'loss': ['squared_loss', 'huber', 'epsilon_insensitive'],
        'penalty': ['elasticnet'],
        'alpha': [0.0, 0.01, 0.001] ,
        'learning_rate': ['invscaling', 'constant'] ,
        'fit_intercept': [True, False],
        'l1_ratio': [0.25, 0.0, 1.0, 0.75, 0.5],
        'eta0': [0.1, 1.0, 0.01],
        'power_t': [0.5, 0.0, 1.0, 0.1, 100.0, 10.0, 50.0]
    },

    'tpot.builtins.OneHotEncoder': {
        'minimum_fraction': [0.05, 0.1, 0.15, 0.2, 0.25],
        'sparse': [False],
        'threshold': [10]
    },

    'sklearn.feature_selection.SelectPercentile': {
        'percentile': range(1, 100),
        'score_func': {
            'sklearn.feature_selection.f_regression': None
        }
    }

}

Be aware however that this would potentially arrive at pipelines where the order and stacking are different. That said, maybe that's a good thing if you let TPOT innovate not only on hyperparameters but also on the exact nature of the pipeline.

Upvotes: 1

Related Questions