Clement Ros
Clement Ros

Reputation: 329

Hyperparameter tuning

I'm currently doing a project on my own. For this project i tried to compare results of multiple algorithms. But i want to be sure that every algorithm tested is configured to give the best results.

So i use cross validation and to test every combination of parameters and choose the best.

For example :

def KMeanstest(param_grid, n_jobs): 

    estimator = KMeans()

    cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=42)

    regressor = GridSearchCV(estimator=estimator, cv=cv, param_grid=param_grid, n_jobs=n_jobs) 

    regressor.fit(X_train, y_train) 

    print("Best Estimator learned through GridSearch") 
    print(regressor.best_estimator_)

    return cv, regressor.best_estimator_

param_grid={'n_clusters': [2], 
            'init': ['k-means++', 'random'],
            'max_iter': [100, 200, 300, 400, 500],
            'n_init': [8, 9, 10, 11, 12, 13, 14, 15, 16], 
            'tol': [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6], 
            'precompute_distances': ['auto', True, False], 
            'random_state': [42],
            'copy_x': [True, False],
            'n_jobs': [-1],
            'algorithm': ['auto', 'full', 'elkan']
           }

n_jobs=-1

cv,best_est=KMeanstest(param_grid, n_jobs)

But this is very time comsuming. I want to know if this method is the best or if i need to use a different approach.

Thank you for your help

Upvotes: 1

Views: 2295

Answers (3)

Shaunak Sen
Shaunak Sen

Reputation: 578

The problem with GridSearch is that it is very time-consuming as you have rightly said. RandomSearch can be a good option sometimes, but it is not optimal.

Bayesian Optimization is another option. this allows us to rapidly zone in on the optimal parameter set using a probabilistic approach. I have tried it personally using the hyperopt library in python and it works really well. Check out this tutorial for more information. You can also download the associated notebook from my GitHub

The good thing is that since you have already experimented with GridSearch, you have a rough idea of which parameter ranges do not work well. So you can define a more accurate search space for the Bayesian Optimization to run on, and this will reduce the time even more. Also, hyperopt can be used to compare multiple algorithms and their respective parameters.

Upvotes: 3

Poe Dator
Poe Dator

Reputation: 4893

besides Random Search and Grid Search, there are tools and libraries for more intelligent hyperparameters tuning. I used Optuna successfully but there are few more out there.

Upvotes: 3

Amit Gupta
Amit Gupta

Reputation: 2918

You can try Random Search in place of Grid Search, Random search is a technique where random combinations of the hyperparameters are used to find the best solution for the built model. It tries random combinations of a range of values. To optimize with random search, the function is evaluated at some number of random configurations in the parameter space.

You can find the details on sklearn documentation page. A comparison is given between Random and Grid Search.

I hope you find this useful.

Upvotes: 1

Related Questions